Design of Low Power Applications using Inexact Logic...

i

Design of Low Power Applications using Inexact Logic Circuits

By

Bharghava R

200741005

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Science (by Research) in

VLSI & Embedded Systems

Centre for VLSI & Embedded Systems Technologies International Institute of Information Technology

Hyderabad, India May 2010

ii

Copyright © 2010 Bharghava R All Rights Reserved

iii

Dedicated to my parents, Uma and Rajaram, without whom, I would not be…

iv

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY Hyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “Design of Low Power

Applications using Inexact Logic Circuits” by Bharghava R (200741005) submitted in

partial fulfilment for the award of the degree of Master of Science (by Research) in VLSI

& Embedded Systems, has been carried out under our supervision and it is not submitted

elsewhere for a degree.

__________ _____________ Date Advisor: Dr. Suresh Purini Asst. Professor IIIT, Hyderabad __________ _____________ Date Advisor:

Prof. Govindarajulu Professor IIIT, Hyderabad

v

Acknowledgements

I owe my deepest gratitude to my advisors, Dr. Suresh Purini and Professor

Govindarajulu whose constant encouragement, guidance and support enabled

me to accomplish this work.

I also thank Prof. M. Satyam for his feedback on various aspects of my work. I

am exceptionally thankful to Abinesh for his valuable help and feedback on my

work. I would like to thank Avi Dullu, and Mukund Ramakrishna for their

contribution to this thesis. I would like to thank all my friends in CVEST lab for the

terrific company during my study.

Finally, I want to thank my family for their unconditional love. Their constant

encouragement, and their faith in me, has always given me the strength to try to

achieve more and to be a better person.

vi

Abstract

Ever since the induction of Integrated circuits into mainstream usage,

numerous research efforts have been made into optimizing the circuits

implemented on silicon with respect to Power, Area, Time (PAT), and most

recently reliability. Due to the recent insurgence of mobile devices, low power

design techniques are being given more emphasis. Power dissipated can be

reduced at the device, circuit, system architecture, or software design levels.

The focus of this work is aimed towards the reduction of power dissipation

in electronic systems, by way of using inexact logic in implementing systems,

where error can be tolerated, and/or neglected. An inexact logic circuit is

constructed by selectively converting minterms/maxterms of its Boolean function

into don’t cares. The intuition in doing this is that by converting only a small

fraction of minterms/maxterms into don’t-cares, an inexact version of the Boolean

function can be synthesized with a significantly lower area-power-delay footprint

than the exact Boolean function.

Decision making circuits, which are generally sensitive to errors, have

been chosen as the subject of analysis. Several applications are presented

where inexactness can be applied, and their performance, in terms of power and

vii

system accuracy is quantified with varying levels of inexactness. This thesis also

outlines a set of general guidelines to design inexact circuits, on a per case basis.

To simplify the process, a heuristic framework to generate inexact circuits with

varying levels of inexactness has been implemented.

The above mentioned applications are classified into 3 categories based

on the impact of the decision made, on system accuracy. These categories are:

No impact, Tolerable impact, and Significant impact, on system accuracy. For

applications with no impact, the only deciding criteria in introducing inexactness

are the overall improvement in power and speed. For applications under the

tolerable impact category, a system accuracy parameter should also be

considered in validating an inexact circuit. For applications with significant impact,

inexactness is seldom tolerated, as system accuracy is critical.

Bus coding, Median Filter based Image blurring, and Non-Modular

Redundancy (NMR), falling under the no impact, tolerable impact, and significant

impact categories respectively share the majority voter as a common decision

circuit used. A Least Recently Used (LRU)-variant replacement algorithm for

Translation Lookaside Buffers (TLB), and a DCT threshold based image blurring

process, with the former having no impact on system accuracy and the latter

having tolerable impact, share a comparator as a decision circuit. Different

viii

inexact versions of these decision circuits are generated and their performance

parameters with respect to the applications are measured.

A comparison has been made with the levels of inexactness and the

power dissipation of the system and the system performance. The results

obtained promise a drastic reduction in power dissipation, up to 300%, with

tolerable deviation in system accuracy. Although critical path delay was not a

parameter for optimization, a gain of up to 30% was observed on this front. The

chip area and static power dissipation is also reduced significantly.

The results obtained validate the use of inexact logic in applications which

are either impervious or tolerant to the errors produced due to the inexactness.

This provides a new avenue to low power system design, which can be used in

congruence to other circuit design, and system level design techniques.

ix

Contents Contents ..................................................................................................... ix

List of Tables .............................................................................................. xi

List of Figures ........................................................................................... xii

List of Relevant Publications ................................................................... xiii

Chapter 1 ......................................................................................................1

Introduction ......................................................................................................... 1 1.1 Power Dissipation in Integrated Circuits [1.11] ............................................. 2 1.2 Low Power Systems ........................................................................................ 3 1.2.1 System Architecture Level .............................................................................. 5 1.2.2 Circuit Level ................................................................................................... 6 1.3 Proposed Technique – Inexact Circuit Design ................................................ 7 1.4 Organization of the Thesis .............................................................................. 9

Chapter 2 .................................................................................................... 11

For Basic Understanding .................................................................................. 11 2.1 Digital Electronics .............................................................................................. 11 2.2 Circuit Minimization ........................................................................................... 12 2.3 Truth Table .......................................................................................................... 14 2.4 Karnaugh Map .................................................................................................... 15 2.4 Genetic Algorithms ............................................................................................. 19 2.6 Hamming Threshold Voter ................................................................................. 21 2.7 Digital Comparator ............................................................................................. 22

Chapter 3 .................................................................................................... 24

Inexact Circuit Design....................................................................................... 24 3.1 Related Work ...................................................................................................... 24 3.2 Concept of Inexactness ....................................................................................... 28

Chapter 4 .................................................................................................... 32

Applications under the No Impact Category .................................................. 32 4.1 Bus coding .......................................................................................................... 32 4.2 Translation Lookaside Buffer (TLB) .................................................................. 40

x

Chapter 5 .................................................................................................... 44

Applications under the Tolerable Impact Category ...................................... 44 5.1 Majority voter based Rank order Median Filter ................................................. 44 5.1 Frequency based Image blurring ......................................................................... 49

Chapter 6 .................................................................................................... 51

Design Methodology of Inexact Circuits ......................................................... 51 6.1 K-Map based Approach ...................................................................................... 51 6.2 Heuristic Framework for Inexact Circuit Generation ......................................... 53 6.3 Other Methods to design Inexact Circuits .......................................................... 57

Chapter 7 .................................................................................................... 60

Results ................................................................................................................. 60 7.1 Bus Coding .......................................................................................................... 60 7.2 Rank Order based Median Filter ......................................................................... 64 7.3 Frequency based Blur Filter ................................................................................ 68 7.4 Translation Lookaside Buffer (TLB) .................................................................. 70

Chapter 8 .................................................................................................... 73

Conclusions ........................................................................................................ 73 8.1 Summary of Work ............................................................................................... 73 8.2 Applications affected by Inexactness .................................................................. 74 8.3 Inference from Results ........................................................................................ 76

Bibliography ............................................................................................... 78

xi

List of Tables Table 2.1 Bus Invert Decision Making ............................................................................. 34 Table 2.2 Bus Invert Decision Encoder ............................................................................ 35

xii

List of Figures Figure 2.1 Example of Circuit Minimization………………………………………….....13 Figure 2.2 Truth Table & K-Map of a Full Adder……………………………………….14 Figure 2.3 A four variable minterm Karnaugh map……………………………….……..16 Figure 2.4 4 set Venn diagram with numbers (0-15) and set names (A-D)……………...18 Figure 3.1 (a) Exact K-Map (b) Inexact K-Map……………………………….………29 Figure 4.1 Bus Invert Block Diagram……………………………………………...….…33 Figure 4.2 K-Map of an Exact Majority Voter……………………………………….….36 Figure 4.3 K-Map of an Inexact Majority Voter……………………………………...…36 Figure 4.4 8-bit Majority Voter Circuit……………………………………………….…37 Figure 4.5 3-out-of-4 Block……………………………………………………………...37 Figure 4.6 8-bit Inexact Majority Voter Circuit……………………………………….…37 Figure 4.7 High Level Architecture for the Transition Inversion scheme…………….…39 Figure 4.8 Aging Page Replacement Algorithm Illustrated………....…………………...42 Figure 5.1 Rank-order filter algorithm (median detection, for n=5)...……………….......46 Figure 5.2 Complete diagram of the bit-serial generic rank filter...…………………..…48 Figure 5.3 Diagram of the logic unit (LU)………………………………………….....…48 Figure 5.4 Exact Comparator……...…………………….……………………...…..……50 Figure 5.5 Inexact Comparator…...…………………………………………………...…50 Figure 7.1 Percentage reduction in transitions with varying inexactness...………..…….61 Figure 7.2 Dynamic Power Dissipation varying with inexactness (8-bit)…………...…..62 Figure 7.3 Circuit Area varying with inexactness (8-bit)…………………………..……62 Figure 7.4 Critical Path Delay varying with inexactness (8-bit)……………………...….63 Figure 7.5 Comparison of overall power saved by the Bus Invert with inexactness ..…..63 Figure 7.6 Dynamic Power Dissipation varying with inexactness (9-bit)……….………65 Figure 7.7 Circuit Area varying with inexactness. (9-bit) ………………………….…...65 Figure 7.8 Blur metric varying with inexactness (Median Filter) …………………..…...66 Figure 7.9 Blur metric varying with inexactness (Median Filter - Hybrid)………..…….67 Figure 7.10 Blurred Image Samples………………………………………………….….68 Figure 7.11 Circuit Area varying with Inexactness (Comparator)…………………….....69 Figure 7.12 Dynamic Power Dissipation varying with Inexactness (Comparator)……...69 Figure 7.13 Blur metric varying with inexactness (DCT based Filter)………………..…70 Figure 7.14 Page Fault varying with Inexactness………………………………………..71 Figure 7.15 Circuit Area varying with inexactness (Comparator)…………………….…72 Figure 7.16 Dynamic Power Dissipation varying with inexactness (Comparator)…..…..72

xiii

List of Relevant Publications

• Bharghava R., Abinesh R., Suresh Purini, Govindarajulu Regeti, “Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting”, Selected for publication in special issue of Journal of Low Power Electronics, to appear in October 2010.

• Bharghava R., Abinesh R., Suresh Purini, Govindarajulu Regeti, “Inexact

Decision Circuits: An Application to Hamming Weight Threshold Voting”, 23rd International Conference on VLSI Design, January 2010

• Abinesh R., Bharghava R., M.B. Srinivas, “Transition Inversion Based Low

Power Data Coding Scheme for Synchronous Serial Communication”, ISVLSI, pp.103-108, 2009 IEEE Computer Society Annual Symposium on VLSI, 2009

• Joint Winner of Intel Research Challenge (also known as Intel Scholar

Program) 2008-2009 http://www.intel.com/cd/corporate/education/APAC/ENG/in/news/news43/419015.htm

1

Chapter 1

Introduction

The field of electronics has undergone several transformations in order to

cater to the needs of human society. Several innovations in device technology,

circuit design methodologies, and architectural perspective, have resulted in

systems with increasing performance over the years. Critical parameters

considered during the design of electronic systems, are the speed of operation

(in terms of operating frequency), the power dissipated by the system, the area

occupied by the circuit in silicon, and reliability of the system. Earlier research

interest was in designing high speed systems, but this has given way to the

design of highly energy efficient systems [1-3] in the last half decade. Of late, the

advent of mobile devices, and portable computing platforms, increasing threat of

energy shortage, and e-junk in the form of batteries, and other toxic substances

used in manufacturing, has increased the emphasis on Low Power design.

Battery driven devices have made power consumption a significant parameter to

be incorporated into system design rather than adding power management

features.

The increased interest shown in low power systems is also a direct result of

the infiltration of Green Computing into mainstream electronic devices. Green

Computing or Green IT refers to environmentally sustainable computing or IT. It

is "the study and practice of designing, manufacturing, using, and disposing of

2

computers, servers, and associated subsystems—such as monitors, printers,

storage devices, and networking and communications systems—efficiently and

effectively with minimal or no impact on the environment.” Unlike in a broader

sense of energy conservation, in the context of VLSI, this involves the design of

systems consuming lesser power than their present counterparts.

1.1 Power Dissipation in Integrated Circuits

An integrated circuit chip contains many capacitive loads, formed both

intentionally (as is the case with gate to channel capacitance) and unintentionally

(between any conductors that are near each other but not electrically connected).

Changing the state of the circuit causes a change in the voltage across these

parasitic capacitances, which involves a change in the amount of stored energy.

As the capacitive loads are charged and discharged through resistive devices, an

amount of energy comparable to that stored in the capacitor is dissipated as heat,

as shown in the following expression

Where Estored is the Energy stored in the capacitor in the form of electrostatic

charge. C is the capacitance formed as mentioned above, and V is the voltage of

operation. The power dissipated is given by the energy dissipated in unit time,

which adds the frequency, and the switching factor to the right hand side of the

equation. The switching factor is the average number of state changes made at

3

that capacitance node per cycle. This multiplied by the frequency gives the

average switching per time unit.

The result of heat dissipation on state change is to limit the amount of

computation that may be performed on a given power budget. While device

shrinkage can reduce some of the parasitic capacitances, the number of devices

on an integrated circuit chip has increased more than enough to compensate for

reduced capacitance in each individual device.

1.2 Low Power Systems

Design optimization is done at all the design levels involved. These design

levels are generally stacked in 4 distinct layers: device, circuit, system, software.

Optimization of power and/or area often leads to degradation of speed, and vice

verse. The inverse relation between power and speed results in a design trade-

off, often leading to applications being classified into low power or high speed

scenarios. Almost all research in electronics is governed by, or rather guided or

motivated by, Moore's Law [4], which describes a long-term trend in the history of

computing hardware, according to which the number of transistors that can be

placed inexpensively on an integrated circuit has doubled approximately every

two years. In Moore's own words - “The complexity for minimum component

costs has increased at a rate of roughly a factor of two per year... Certainly over

the short term this rate can be expected to continue, if not to increase. Over the

4

longer term, the rate of increase is a bit more uncertain, although there is no

reason to believe it will not remain nearly constant for at least 10 years. That

means by 1975, the number of components per integrated circuit for minimum

cost will be 65,000. I believe that such a large circuit can be built on a single

wafer.”

While it is generally accepted that this exponential improvement trend will end,

it is unclear exactly how dense and fast integrated circuits will get by the time this

point is reached. Working devices have been demonstrated that were fabricated

with a MOSFET transistor channel length of 6.3 nanometres using conventional

semiconductor materials, and devices have been built that used carbon

nanotubes as MOSFET gates, giving a channel length of approximately one

nanometre. The density and computing power of integrated circuits are limited

primarily by power dissipation concerns.

Another obstacle to this trend [5], one of which (especially for mobile devices)

is that, the battery technology has been following a slower trend. Recent

improvements in battery and process technology have been aimed at meeting

the increased energy demands of the up-and-coming portable systems. However,

it may be years before next-generation battery technologies, such as fuel cells,

become commercially viable. Hence research is being made at all levels of a

system stack to enable reduction in power consumption. Designers today

continue to be challenged with the need to manage power, timing and signal

integrity concurrently throughout the design flow. Traditional power optimization

5

techniques and today's power-aware design flows are proving insufficient in the

design of systems-on-a-chip (SoCs) for next-generation applications, and must

evolve to enable design for energy efficiency.

Device level optimization involves the design of newer devices with better

scaling, and lesser device dimensions. But this may not be for long, since, a lot of

limiting factors are coming into picture as the transistor feature size is reduced

[10]. Optimization at circuit level involves the design of more efficient circuits to

implement the required logic. There are multiple circuit design techniques in

practice [16-17], which provide the desired power-delay characteristics. This also

involves implementation of newer logic offering more functionality [11]. Examples

of this include the design of universal BCD/Binary adders, bidirectional shift

registers, etc. System level optimization involves the architecture of the system,

at the micro and macro levels. Finally, if the system is programmable, the

software running on the system also plays a vital role in performance. There has

been considerable research in this front yielding many standard techniques that,

when adopted, provide better performance results. Optimization techniques at

the system, and circuit level, which are of immediate concern, have been

elaborated in the following paragraphs.

1.2.1 System Architecture Level

Work done at this level concentrates on system components as a whole.

This is generally done at the processor level, or as part of the operating system

6

governing the processor, if any. In the absence of an operating system the

system software is worked upon. A recent segment of computers, as opposed to

servers, desktops, and laptops, has become quite popular: the Netbook [6].

Netbooks are a branch of sub-notebooks, a rapidly evolving category of small,

lightweight, and inexpensive laptop computers suited for general computing and

accessing Web-based applications; they are often marketed as "companion

devices", i.e., to augment a user's primary computer access. These devices were

built around low power processors which shed some desirable architectural

features (e.g. VIA Nano, ARM, Intel Atom [7]). The performance per watt

(MHz/Watt or MIPS/Watt) has been accepted as a metric of comparison. This is

exemplified in the new list of top 500 environmentally efficient supercomputers

[8][9] , in addition to the top 500 supercomputer list, which are ordered in terms of

their efficiency in energy consumption rather than sheer performance.

1.2.2 Circuit Level

In another direction of research, power management protocols have been

developed incorporating various features that take into account circuit related

innovations. Some of the important features are voltage and frequency scaling

[11]. In these techniques, the clock frequency and core voltage of the processor

are changed depending on the required/expected performance of the processor.

This approach is limited by thermal noise within the circuit. There is a

characteristic voltage proportional to the device temperature and to the

7

Boltzmann constant, which the state switching voltage must exceed in order for

the circuit to be resistant to noise. This is typically on the order of 50–100 mV, for

devices rated to 100 degrees Celsius external temperature (about 4 kT, where T

is the device's internal temperature in Kelvin and k is the Boltzmann constant).

Other popular techniques involve clock gating [12] and power gating [13] [14].

As per the Law of diminishing returns, any small improvement needed in the

output generally requires a large change in input once a high performance state

is achieved [15]. This is the case with system design in general where design is

initially done with a higher performance in mind, and a few parameters are

tweaked to obtain lesser power dissipation. A better approach is to incorporate

low power features during the design phase. There are numerous circuit design

techniques like Transistor Sizing, Logic optimization, Activity Driven Power Down,

low-swing logic, adiabatic switching [16][17].

1.3 Proposed Technique – Inexact Circuit Design

Of late, few unconventional methods of optimization have evolved, one of

which is the use of probabilistic logic [49]. This involves driving different parts of

the system with different source voltages with the presumption that the error

arising out of the Low-VDD operation can be tolerated. This has been

demonstrated for arithmetic circuits in areas of multimedia processing. Also the

error in this methodology is not under the designer's or user's control, but is

governed by a probability related to the device characteristics, and operating

8

conditions. This probability can however be theoretically modelled, in simpler

cases, or can be characterized empirically.

In this work, an orthogonal level of design optimization is proposed to

reduce the power dissipation in a system, by compromising on the veracity of the

system output. The term orthogonal is used in the sense that any other traditional

methodology can be applied in congruence with the proposed technique. By

introducing a predetermined error, or inexactness, in the functionality of the

components of the system, the amount of hardware required to implement these

components can be reduced. This error can be introduced at the algorithm level,

logic design level, circuit level, or at the device level.

This work focuses on obtaining lesser power dissipation using inexactness

at the logic layer, by analyzing the effect of using an inexact logic function on the

power and critical delay of the circuit in question, and the impact of inexactness

on the system performance. Here system performance is defined as the extent to

which the system with an inexact function can approximate the original system

function. Power reduction is achieved by replacing the required circuit function,

with another function which provides a similar to, but not necessarily a subset or

superset of, the input-to-output relation. The functional difference between the

alternate circuit and the original circuit, or, the level of inexactness, can vary

depending on the application. The field of work is further narrowed by considering

decision circuits in applications that are not affected by, or can tolerate,

9

inaccuracy. Low power applications are designed using these inexact logic

circuits.

The idea was constructed as part of a circuit design required for a serial

data coding scheme [18] for low power transmission. For feasibility of the coding

scheme, an inexact version of the decision circuit was designed to facilitate

power reduction, using hamming threshold comparator as an example [19].

Further applications supporting inexactness and a heuristic design framework

have also been presented [20]. Further optimization at circuit and system level

can follow this procedure. The concept proposed is explained in later sections,

followed by a design methodology.

For the purpose of this work the circuits used are an inexact Hamming

threshold voter, and a comparator. Applications are designed using these inexact

logic circuits, and are analyzed with varying levels of inexactness. These

applications are categorized according to their tolerance to the error introduced

due to the inexactness. A comparison between the levels of inexactness and the

power dissipation of the system and the system performance is presented in the

results section. The terms inexact circuit and inexact logic is used

interchangeably in this thesis.

1.4 Organization of the Thesis

The first chapter, so far, introduced the importance of low power design in

today's perspective, and provided a glimpse at the proposed technique in

10

addressing the issue of power dissipation. The rest of the thesis is organized as

follows:

• Chapter 2 provides the basic information required to aid in understanding

the concept and the work process.

• Chapter 3 explains the concept of inexactness, while also providing an

insight into the effect of circuit inexactness on the system.

• The 4th chapter covers a set of applications that are not affected by inexact

circuits, as the inexactness does not fall in the datapath of the system, or

can be rectified through the inexact decision.

• The 5th chapter covers application scenarios where error introduced by

inexactness can be tolerated.

• Chapter 6 presents manual and automated design methodologies to build

inexact circuits.

• Chapter 7 elaborates on the power performance results of the designed

inexact circuits, and the impact of inexactness on system accuracy.

• Chapter 8 concludes the thesis by summarizing the results, and inferring

the advantages of inexact logic in low power system design.

11

Chapter 2

For Basic Understanding

This section gives a basic understanding of the concepts involved, to give

the reader a better perspective.

2.1 Digital Electronics

Digital circuits [21] are electronic circuits based on a number of discrete

voltage levels. They are the most common physical representation of Boolean

algebra and are the basis of all digital computers. The terms "digital circuit",

"digital system" and "logic" are interchangeable in the context of digital circuits.

Most digital circuits use two voltage levels labelled "Low" and "High". Often "Low"

will be near zero volts and "High" will be at a higher level nearer to the supply

voltage in use. The fundamental advantage of digital techniques stem from the

fact it is easier to get an electronic device to switch into one of a number of

known states than to accurately reproduce a continuous range of values.

Computers, digital signal processors, programmable logic controllers

(used to control industrial processes), cell phones, audio players, are examples

of applications constructed around digital circuits. Digital electronics are usually

made from large assemblies of logic gates, which are simple electronic

representations of Boolean logic functions. A Boolean function describes how to

12

determine a Boolean value output based on some logical calculation from

Boolean inputs. Such functions play a basic role in questions of complexity theory

as well as the design of circuits and chips for digital computers. Engineers use

many methods to minimize logic functions, in order to reduce the circuit's

complexity. When the complexity is less, the circuit also has fewer errors and

less electronics, and is therefore less expensive. This also results in lesser

utilization of power during operation. Historically, binary decision diagrams,

Quine–McCluskey [22][23] algorithm (automated), truth tables, Karnaugh Maps

[24], and Boolean algebra have been used, to aid in the process of simplification.

The most widely used practical simplification is a minimization algorithm like the

Espresso [25] heuristic logic minimizer within a CAD system. Other heuristic

techniques like genetic algorithms, and swarm intelligence are also used [28-32].

2.2 Circuit Minimization

In Boolean algebra, circuit minimization is the problem of obtaining the

smallest logic circuit (Boolean formula) that represents a given Boolean function

or truth table. The general circuit minimization problem is believed to be

intractable [33][34] but there are effective heuristics such as Karnaugh maps[24]

and the Quine–McCluskey algorithm [22][23] that facilitate the process. The

problem with having a complicated circuit (i.e. one with many elements, such as

logical gates) is that each element takes up physical space in its implementation

and costs time and money to produce in itself.

13

While there are many ways to minimize a circuit [33], this is an example

that minimizes (or simplifies) a Boolean function. Note that the Boolean function

carried out by the circuit in Figure 2.1 is used to compute the expression given by

(A’ and B) or (A and B’). It is evident that two negations, two conjunctions, and a

disjunction are used in this statement. This means that to build the circuit one

would need two inverters, two AND gates, and an OR gate.

Figure 2.1. Example of Circuit Minimization

We can simplify (minimize) the circuit by applying logical identities or

using intuition. Since the example states that A is true when B is false or the

other way around, we can conclude that this simply means A is not equal to B. In

terms of logical gates, inequality simply means an XOR gate (exclusive or).

Therefore, the two circuits shown in the figure are equivalent.

14

2.3 Truth Table

A truth table is a mathematical table used in logic—specifically in

connection with Boolean algebra, and Boolean function—to compute the

functional values of logical expressions on each of their functional arguments,

that is, on each combination of values taken by their logical variables [35].

Practically, a truth table is composed of one column for each input variable

(for example, A and B), and one final column for all of the possible results of the

logical operation that the table is meant to represent (for example, A OR B). Each

row of the truth table therefore contains one possible configuration of the input

variables (for instance, A=true B=false), and the result of the operation for those

values. A full adder’s truth table is shown in Figure 2.2, along with its K-Map

minimization. K-Map minimization is elaborated in the next section.

Figure 2.2. Truth Table & K-Map of a Full Adder

15

Truth tables are a simple and straightforward way to encode Boolean

functions. However, given the exponential growth in size as the number of inputs

increase, they are not suitable for functions with a large number of inputs. Other

representations which are more memory efficient are textual equations and

binary decision diagrams. In digital electronics, truth tables can be used to

reduce basic Boolean operations to simple correlations of inputs to outputs,

without the use of logic gates or code.

2.4 Karnaugh Map

The Karnaugh map [24] (K-map for short), Maurice Karnaugh's 1953

refinement of Edward Veitch's 1952 Veitch diagram, is a method to simplify

Boolean algebra expressions. The Karnaugh map reduces the need for extensive

calculations by taking advantage of humans' pattern-recognition capability,

permitting the rapid identification and elimination of potential race conditions.

In a Karnaugh map the Boolean variables are transferred (generally from a

truth table) and ordered according to the principles of Gray code in which only

one variable changes in value in between squares along rows/columns. Once the

table is generated and the output possibilities are transcribed, the data is

arranged into the largest possible groups containing 2n cells (n=0,1,2,3...) and the

minterm is generated through the axiom laws of Boolean algebra.

16

The size of the Karnaugh map with 'n' Boolean variables is determined by 2n. The

size of the group within a Karnaugh map with ‘n’ Boolean variables and 'k'

number of terms in the resulting Boolean expression is determined by 2nk. A

generic 4 variable K-Map is shown in Figure 2.3.

Figure 2.3. A four variable minterm Karnaugh map

Normally, extensive calculations are required to obtain the minimal

expression of a Boolean function; however Karnaugh mapping reduces the need

for such calculations by:

• Taking advantage of the human brain's pattern-matching capability to

decide which terms should be combined to obtain the simplest expression.

• Permitting the rapid identification and elimination of potential race hazards,

which is unavoidable in Boolean equations.

• Providing an excellent aid for simplification of up to six variables, however

with more variables it becomes more difficult to discern optimal patterns.

17

• Helping to teach about Boolean functions and minimization.

• For problems involving more than six variables, solving the Boolean

expressions is preferred over the use of a Karnaugh mapping.

Karnaugh maps generally become more cluttered and hard to interpret

when adding more variables. A general rule is that Karnaugh maps work well for

up to four variables, and shouldn't be used at all for more than six variables. For

expressions with larger numbers of variables, the Quinn–McCluskey algorithm

can be used.

When the Karnaugh map has been completed, to derive a minimized

function the "1s" or desired outputs are grouped into the largest possible

rectangular groups in which the number of grid boxes (output possibilities) in the

groups must be equal to a power of 2. For example, the groups may be 4 boxes

in a line, 2 boxes high by 4 boxes long, 2 boxes by 2 boxes, and so on. "Don't

care(s)" possibilities (generally represented by an "X") are grouped only if the

group created is larger than the group with "Don't care" is excluded. The boxes

can be used more than once only if it generates the least number of groups.

Each "1" or desired output possibilities must be contained within at least one

grouping.

18

Figure 2.4. 4 set Venn diagram with numbers (0-15) and set names (A-D)

The groups generated are converted to a Boolean expression by: locating

and transcribing the variable possibility attributed to the box, and by the axiom

laws of Boolean algebra—in which if the (initial) variable possibility and its

inverse are contained within the same group the variable term is removed. Each

group provides a "product" to create a "sum-of-products" in the Boolean

expression. To determine the inverse of the Karnaugh map, the "0s" are grouped

instead of the "1s". The two expressions are non-complementary.

Each square in a Karnaugh map corresponds to a minterm (and maxterm).

The picture in Figure 2.4 shows the location of each minterm on the map. A Venn

diagram of four sets—labeled A, B, C, and D—is shown to the right that

corresponds to the 4-variable K-map of minterms just above it:

19

• Variable A of the K-map corresponds to set A in the Venn diagram; etc.

• Minterm m0 of the K-map corresponds to area 0 in the Venn diagram; etc.

• Minterm m9 is ABCD (or 1001) in the K-map corresponds only to where

sets A & D intersect in the Venn diagram.

Thus, a specific minterm identifies a unique intersection of all four sets.

The Venn diagram can include an infinite number of sets and still correspond to

the respective Karnaugh maps. With increasing number of sets and variables,

both Venn diagram and Karnaugh map increase in complexity to draw and

manage. The grid is toroidally connected, so the rectangular groups can wrap

around edges. For example m9 can be grouped with m1; just as m0, m8, m2,

and m10 can be combined into a four-by-four group.

2.4 Genetic Algorithms

Genetic algorithms [36] are implemented in a computer simulation in which

a population of abstract representations (called chromosomes or the genotype of

the genome) of candidate solutions (called individuals, creatures, or phenotypes)

is mapped to an optimization problem evolves toward better solutions.

Traditionally, solutions are represented in binary as strings of 0s and 1s, but

other encodings are also possible. The evolution usually starts from a population

of randomly generated individuals and happens in generations. In each

20

generation, the 'fitness' of every individual in the population is evaluated, multiple

individuals are stochastically selected from the current population (based on their

fitness), and modified (recombined and possibly randomly mutated) to form a

new population. The new population is then used in the next iteration of the

algorithm. Commonly, the algorithm terminates when either a maximum number

of generations has been produced, or a satisfactory fitness level has been

reached for the population. The algorithm is shown below.

Algorithm 1: Basic Genetic Algorithm _

1. Generate an initial population.

2. Calculate the fitness function for each individual.

3. repeat

3.1. Select two parents from individuals of last generation for crossover.

3.2. Cross individuals with a probability.

3.3. Mutate both parents with a probability.

3.4. Calculate the fitness for the mutated individuals.

3.5. Insert the mutated individuals in the new generation.

4. until convergence

_ _

If the algorithm has terminated due to a maximum number of generations,

a satisfactory solution may or may not have been reached. The basic algorithm of

any evolutionary optimization technique is given as follows. Genetic algorithms

21

have been used in circuit minimization as shown in [28][29]. Most of the genetic

algorithm based circuit design uses a Cartesian Genetic Programming array,

where the genotype considered is in terms of the input output relation and

interconnection among a matrix of programmable logic functions.

2.6 Hamming Threshold Voter

Let H(X) denote the Hamming weight of an n-bit binary vector X = {x1, x2, .

. . , xn}, i.e. the number of 1’s in it. Here we consider circuits that compare H(X) to

a fixed threshold 'k'. The output of the voter is a single bit indicating whether the

number of 1's is above the threshold or not. Some Hamming weight comparators

were proposed recently in [2.17][2.18]. They can also be designed using two n-bit

counters of 1’s and a comparator of K-bit integers, where K is the logarithm of the

bit length; for the most complete survey of counters of 1’s see [39].

The numerous applications of such comparators include digital neural

networks [40], pattern matching and data compression [41][42] and median and

rank order filters [43][44]. Since a 50% threshold is generally used in the

applications considered for analysis, a special case of the Hamming threshold

voter, called the majority voter is used. This is also called a Hamming

Comparator in later chapters.

22

2.7 Digital Comparator

A digital comparator or magnitude comparator is a hardware electronic

device that takes two numbers as input in binary form and determines whether

one number is greater than, less than or equal to the other number. Comparators

are used in a central processing units (CPU) and micro-controllers. Examples of

digital comparator include the CMOS 4063 and 4585 and the TTL 7485 and

74682-'89. Consider two 4-bit binary numbers A and B such that A = A3A2A1A0

and B = B3B2B1B0. Here each subscript represents one of the digits in the

numbers.

The binary numbers A and B will be equal if all the pairs of significant

digits of both numbers are equal, i.e., A3 = B3, A2 = B2, A1 = B1 and A0 = B0. Since

the numbers are binary, the digits are either 0 or 1 and the boolean function for

equality of any two digits Ai and Bi can be expressed as:

xi is 1 only if Ai and Bi are equal. For the equality of A and B, all xi

variables (for i=0,1,2,3) must be 1. So the equality condition of A and B can be

implemented using the AND operation as (A = B) = x3x2x1x0. The binary variable

(A=B) is 1 only if all pairs of digits of the two numbers are equal.

In order to manually determine the greater of two binary numbers, we

inspect pairs of similar weighted bits, starting from the most significant bit,

23

gradually proceeding towards lower significant bits until an inequality is found.

When an inequality is found, if the corresponding bit of A is 1 and that of B is 0

then we conclude that A>B.

This sequential comparison can be expressed logically as:

(A>B) and (A < B) are output binary variables, which are equal to 1 when A>B or

A<B respectively. Often, it is required only to know the greater or the lesser of

two values. Higher order comparators are generally built with a series of smaller

comparators [45][46].

24

Chapter 3

Inexact Circuit Design

This chapter elaborates on the proposed concept of inexactness, and

discusses some related work in this field of work. As such, there has been no

investigation into the usage of inexact circuits in a variety of application scenarios.

3.1 Related Work

Inexact/Approximate systems are not a completely new concept.

Approximate or inexact solutions are found in abundance in the fields of

computer science and operations research. Here, approximation algorithms are

algorithms used to find approximate solutions to optimization problems.

Algorithms have been approximated before to solve NP-Hard problems [47][48].

Approximation algorithms are often associated with NP-hard problems; since it is

unlikely that there can ever be efficient polynomial time exact algorithms solving

NP-hard problems, one settles for polynomial time sub-optimal solutions.

Unlike heuristics, which usually only find reasonably good solutions

reasonably fast, one wants provable solution quality and provable run time

bounds. Ideally, the approximation is optimal up to a small constant factor (for

instance within 5% of the optimal solution). Approximation algorithms are

25

increasingly being used for problems where exact polynomial-time algorithms are

known but are too expensive due to the input size. A typical example for an

approximation algorithm is the one for vertex cover in graphs, which involves

finding an uncovered edge and adding both endpoints to the vertex cover, until

none remain. It is clear that the resulting cover is at most twice as large as the

optimal one. This is a constant factor approximation algorithm with a factor of 2.

Approximation is also carried out for non NP-hard problems, like in [98],

where the mean and median of a set of numbers is computed in an approximate

fashion to reduce the time complexity. Other approximate solutions of this kind

are used in [99-101], where applications vary from string matching, pattern

recognition, optimization problems, etc.

Approximating computation is common in digital filters [102] where the

data obtained on filtering is either rounded off or truncated to the nearest value.

Truncation is the process of dropping the last few Least Significant Bits (LSBs),

so that the final result can fit in the given hardware register provided. Rounding

off is the process of approximating the data value to the nearest decimal value.

For e.g. 3.78 can be rounded off to 3.8, and 6.11 can be rounded off to 6.1. Here

the rounding off is done for the second digit after the decimal point, around the

first digit. Also, the coefficients chosen for these filters are approximated.

Inexactness has been generally avoided in circuit design until recent times.

This was because technology scaling and other techniques could provide the

required power budget. As scaling becomes stagnant, newer unorthodox

26

techniques of achieving design goals must be sought. There have been efforts in

alleviating the effect of process variation on VLSI circuits. Architectures have

been proposed [97] for certain applications where computation paths which

contribute less to the final result are made longer so that under process variation,

delay errors in these paths do not affect the final outcome significantly.

A recent development in low power system design was the advent of

probabilistic logic [49-51]. The authors, here, propose that arithmetic circuits can

be built to operate with error in applications, which benefit from (or harness)

probabilistic behavior at the device level, or applications that can tolerate

probabilistic behavior at the device level. Probabilistic logic is achieved in the

form of noise during circuit operation. Circuits operated at a voltage closer to the

CMOS threshold voltage for that technology, tend to be affected more by noise.

Applications that can tolerate or utilize this noise can be designed using this

technique. In the former case, the examples of Bayesian Inference [52],

Probabilistic Cellular Automata [53], Random Neural Networks [54], and Hyper

Encryption [55]. In the domain of applications that tolerate probabilistic behavior

they investigate applications which can trade energy and performance for

application-level quality of the solution.

Applications in the domain of digital signal processing were chosen, where

application-level quality of solution is naturally expressed in the form of signal-to-

noise ratio or SNR. In this context, the adders used in the filters in the H.264

decoding algorithm [56] were implemented in probabilistic logic by scaling the

27

voltage across the bit-length, with the voltage reducing towards the LSB. This

technique is elaborated to avoid confusion with the proposed inexactness.

The proposed inexactness is at logic level, and is predetermined and fixed

at the design stage. The inexactness is not a result of the operating conditions.

Where as, in probabilistic logic, the error in the circuit arises due to the voltage

scaling. In probabilistic logic, Circuit partitioning for proper voltage scaling, and

analysis is difficult. Also, the task of providing multiple voltage sources is a

design burden, because of which the number of voltages is generally restricted to

two.

An existing method to design approximate logic circuits is proposed in [57],

where certain 0-minterms are assumed as don’t-cares to form 1-approximate or

0-approximate circuits. The resultant circuit is a subset of the original circuit,

catering to a subset of the input vectors. The purpose of introducing this

approximation in the function is to reduce the area overhead in performing

concurrent error detection. The system accuracy is not compromised, as this

approximate circuit exists in conjunction with the original circuit. In the proposed

work, however, as the system accuracy is compromised, the system designer

has to be provided with a way to vary the inexactness of the circuit, so as to

achieve a satisfactory design trade off. This can be done with the heuristic

framework proposed later on. Also, comparison of the proposed framework with

this algorithm is redundant, as the solution obtained through the algorithm is part

of the design space being explored by the heuristic framework. It is of importance

28

to note the difference between introducing inexactness, and the use of don’t-

cares [34] in digital logic design. Don’t-cares are part of the system and are

included in the system specification. In digital logic, a don't-care term is an input-

sequence/vector to a Boolean function that the designer does not care about,

usually because that input would never happen, or because differences in that

input would not result in any changes to the output. By considering these don't-

care inputs, designers can potentially minimize their function much more so than

if the don't-care inputs were taken to have an output of all 0 or all 1. Examples of

don't-care terms are the binary values 1010 through 1111 (10 through 15 in

decimal) for a function that takes a BCD value, because a BCD value never takes

on values from 1010 to 1111. This is different from deliberately introducing an

error in the function by inverting existing minterms or maxterms. The concept of

inexactness is elaborated in the following section.

3.2 Concept of Inexactness

Design of Inexact circuits is the process of approximating the function of

the logic circuits to be optimized. To elaborate, assume the logic function to be

implemented has a certain set of minterms M. The final circuit implemented will

be a set of minterms M', similar to M, which need neither be a complete subset

nor a superset of M, nor even closely resemble it. The desirable characteristic for

M' is such that the circuit resulting from it requires much less power to operate (or

less delay), and the error arising out of it can be tolerated by the system.

Traditional design techniques assume exact operation of a circuit, according to its

29

specifications. In case of digital system design, the truth table of the system has

to be fully applicable to the circuit designed. Instead of designing an exact

system, certain inexactness can be introduced if it does not lead to unacceptable

performance. This can be either in terms of system accuracy, or human

perception.

Figure 3.1. (a) Exact K-Map (b) Inexact K-Map

For example, consider the function represented in the K-Map shown in

Figure 3.1a. The single ‘1’ that represents the term ab`c`d` involves multiple

gates for its implementation. If it can be ascertained, that removing this‘1’does

not lead to degradation of system performance, the function can be implemented

in an inexact manner, as given in Figure 3.1b. Also the single ‘1’ that corresponds

to a`bcd` needs more gates as shown in Figure 3.1b. If an extra ‘1’ can be added

at a`b`cd` the hardware required to implement the function is reduced.

The error in the system depends on the number of input vectors whose

output is altered, and the total number of input vectors. In the example function

considered, the inexact versions induce a 12.5% error (2 vectors in 16) in the

system assuming all vectors occur equally likely. The circuit designed from this

30

inexact version of the system will consume lesser power, occupy lesser area, and

may also involve a lesser critical path delay, if the terms neglected were originally

the only other term in its level of the critical path. Not all decision circuits can be

designed in an inexact fashion. The extent of inexactness has to be quantified,

and its effect on the system as a whole has to be analyzed extensively before

such a step is taken.

The application scenarios for inexact circuits can be classified into 3 broad

categories depending on the impact of inexactness on the system accuracy. It is

of importance to not that this accuracy is not the error in the circuit itself, but it’s

reflection on the system output. Applications are classified into 3 categories

where inexactness has:-

a) No Impact on system accuracy

b) Tolerable Impact on system accuracy

c) Significant Impact on system accuracy

Applications of the 1st category include branch prediction, bus

coding, cache replacement, coder/decoders, where either, decisions are made

for the purpose of additional optimization, or the decision bits are available to

discern the appropriate operation done. Applications belonging to the 2nd

category are wider in quantity, with a large number of image processing

applications, timer applications, network stacks, etc. Here, the error made can be

tolerated, due to the nature of the application being malleable. The 3rd category

31

of applications, which do not tolerate any error, are generally part of a defined

state machine or are used for the sake of reliability. The later category consists of

error correcting codes, voters for redundancy etc.

Without losing generality applications falling under the first two categories

were chosen for analysis. These applications are chosen such that they share

some circuitry that can be implemented in inexact logic. Applications belonging to

the first category are discussed in the next chapter, and those pertaining to the

second class are discussed in the chapter following the next. Design

methodologies to build inexact logic circuits are presented after that.

32

Chapter 4

Applications under the No Impact Category

In this chapter, two applications where inexactness does not affect the

data accuracy of the system namely, (i) the bus invert application and (ii) a Page

replacement scheme are elaborated. The decision circuit used in the former

application is a Hamming threshold voter, and in the latter, a comparator.

4.1 Bus coding Recent advances in computing uses like graphics, scientific computing

demand data transfer to such high levels that bus interfaces are being constantly

racked up to higher performance points. These applications are highly memory

intensive rather than being just CPU intensive. They need enormous amount of

data to be transferred for computation which has increased bandwidth

requirements of off-chip busses. This in turn entails higher frequencies and

hence higher power consumption.

Reducing this off-chip bus power consumption has become one of the key

issues for low power system design. The fact that the power consumed in bus

accesses account for a significant fraction of the total power consumed in VLSI

(Very Large Scale Integrated Systems) systems has been independently

established by many researchers, [58-60]. Numerous techniques have been

33

proposed in the past in order to reduce the effect of self-capacitance and

coupling-capacitance of buses on the power dissipation of an integrated circuit.

Wire shaping [61], buffer insertion [62], and several bus coding schemes [63-67]

have been used to reduce power dissipation due to coupling-capacitance

between adjacent wires. Mitigating the effect of self-capacitance involves the

reduction of data transitions on the bus. This involves some form of data coding.

Generally, a decision making circuit is required to ascertain whether the data has

to be coded or not. Most techniques like this involve a majority voter, or a similar

circuit.

Figure 4.1. Bus Invert Block Diagram

The first major initiative in bus coding schemes was the Bus-invert coding

scheme [58], represented in Figure 4.1. A number of bus coding techniques that

followed were modifications of the Bus-invert technique. However, Bus Invert

remains one of the very few practically used bus coding schemes, finding

application in Double Data Rate (DDR) Synchronous Dynamic Random Access

Memory (SDRAM), and other bus architectures.

34

Bus invert works by counting the number of transitions, which involves

XORing of the present and previous data. If the number of transitions is more

than half the bus width, the inverted data is transmitted, else the original data is

transmitted. A separate line is also added to the bus which will carry the decision.

The decision bit will signify whether the data that is on the bus is the original data

or it’s complement. The bus invert algorithm is explained below:

Algorithm 2: Bus Invert _

1. Count the transitions between the data on the bus and the next data that is to

be put on bus

2. if transitions count < half of the bus width

2.1. Assign next data to bus

3. else

3.1. Invert the next data and assign the complement to bus

A block diagram of the system is shown in Figure 4.1. A sample decision

making process is shown in Table 4.1.

Table 4.1: Bus Invert Decision Making

Bit No. 1 2 3 4 5 6 7 8

Current Data on bus 1 0 1 0 1 0 1 1

Next Data to be put on Bus 0 1 1 1 0 1 0 1

XOR of present and next data 1 1 0 1 1 1 1 0

35

In the given example the number of transitions is 6, which is more than

half the bus width, 4. So the data is inverted and then sent. The decision is sent

on a separate line. An XOR between the current data and the next data that is

put on the bus shows that the transitions are reduced to 2. This is given by (N-t),

where N is the bus width and t is the original number of transitions. The encoding

process is shown in Table 4.2.

Table 4.2: Bus Invert Decision Encoder Bit No. 1 2 3 4 5 6 7 8

Next Data to be put on Bus 0 1 1 1 0 1 0 1

Next Data that is put on Bus 1 0 0 0 1 0 1 0

Current Data on bus 1 0 1 0 1 0 1 1

XOR of current and next data 0 0 1 0 0 0 0 1

The whole operation involves a chain of full adders to count the transitions

and then perform another XOR on the data that has to be sent. All these

operations have to be done before the next data arrives at the bus. The parallel

XOR array and the chain of full adders contribute to the delay in taking a decision.

Beyond this the encoder delay also has to be taken into account which involves a

parallel XOR to perform controlled inversion. This entire set of operations has to

be over by the time the next data arrives leading to a restriction on bandwidth.

36

Figure 4.2. K-Map of an Exact Majority Voter

Figure 4.3. K-Map of an Inexact Majority Voter

The major bottleneck in implementing such a scheme is the Majority Voter,

which is a special case of a Hamming threshold voter, explained in chapter 2. An

inexact version of the majority voter was designed by using the previously

discussed techniques. For example, using the guidelines mentioned later in

chapter 6, the minterms of an exact voter (Figure 4.2) were manipulated to look

like as in Figure 4.3. The circuits for the same are shown in Figures 4.4, 4.5, and

4.6.

37

Figure 4.4. 8-bit Majority Voter Circuit (FA – Full Adder)

Figure 4.5. 3-out-of-4 Block

Figure 4.6. 8-bit Inexact Majority Voter Circuit

38

The efficiency of the inexact circuit for the case of bus-invert was

compared with the same system using an exact majority voter. The efficiency is

taken in terms of the reduction in transitions. The inexact majority voter will

process certain input vectors in a wrong manner. The results of the different

inexact versions generated are compared with an exact voter as shown and

discussed in the results section.

The inexact voters were also used for a serial bus coding scheme

proposed by the author in [18]. This work outlined a novel Transition Inversion

based data coding protocol by which these transitions on the data line can be

reduced for synchronous serial buses like JTAG, SPI, I2C etc. In serial data

transfer, data is generally loaded onto a buffer in parallel or serial fashion and

then placed on the bus serially. The algorithm first determines the number of

transitions in the data word. If the serial data buffer is loaded in parallel, then a

majority voter circuit is used to count the number of transitions. Serially, the same

process can be done by either, using an XOR gate between consecutive bits and

counting the ‘1’s., or a counter on the line that counts on both the edges.

If the number of transitions is more than half the word length, the

transitions states between the bits can be inverted. In case transition inversion is

needed, the scheme operates by observing the transition states between any 2

bits. Accordingly, the encoded second bit is retained as the previous encoded bit

if there is a transition. If there is no transition, the previous encoded bit is inverted.

39

The decision bit signifying transition inversion is transmitted before transmitting

the encoded data. This results in an overhead for the system.

Figure 4.7. High Level Architecture for the Transit ion Inversion scheme

The decision of the transition inversion is made depending on the count of

the transitions and is stored. The circuit used for this purpose depends on

whether the data is loaded serially, or in a parallel fashion. The bit stream is

encoded if a transition inversion is needed. The bit stream is encoded on the fly

as the data is put on the bus, as shown in Figure 4.7. In the receiver the decoder

has to decode the incoming bit stream and recover the original data. If the serial

buffer is loaded in parallel (Parallel-In-Serial-Out), then the decision circuitry has

to be implemented as combinational logic. This implementation of the transition

counter and decision circuit is built using a majority voter circuit. This is replaced

by an inexact voter, and the performance is compared with regard to the

reduction in transitions. This is presented in the results section.

40

4.2 Translation Lookaside Buffer (TLB)

A TLB [68] is a CPU cache that memory management hardware uses to

improve virtual address translation speed. It was the first cache introduced in

processors. All current desktop and server processors (such as x86) use a TLB.

A TLB has a fixed number of slots that contain page table entries, which map

virtual addresses to physical addresses. The virtual memory is the space seen

from a process and can be greater than the physical memory. This space is

segmented in pages of a predetermined size. Generally only some pages are

loaded in the physical memory in locations depending on the page replacement

policies. In a computer operating system that uses paging for virtual memory

management, page replacement algorithms decide which memory pages to page

out (swap out, write to disk) when a page of memory needs to be allocated.

Paging happens when a page fault occurs and a free page cannot be used to

satisfy the allocation, either because there are none, or the number of free pages

is lower than some threshold.

When the page, that was selected for replacement and swapped out, is

referenced again, it has to be swapped in (read in from disk), and this involves

waiting for I/O completion. This determines the quality of the page replacement

algorithm: the less time waiting for page-ins, the better the algorithm. A page

replacement algorithm looks at the limited information about accesses to the

pages provided by hardware, and tries to guess which pages should be replaced

to minimize the total number of page misses, while balancing this with the costs

41

(primary storage and processor time) of the algorithm itself. Of the several

existent page replacement policies [69-72], the Least Recently Used (LRU)

closely resembles the most optimum performance that can be achieved. But due

to the hardware complexity of its implementation for larger TLBs, it is generally

replaced by a Clock, or Aging algorithm, which coarsely approximates the LRU.

The aging algorithm is a descendant of the Non Frequently Used (NFU)

algorithm, with modifications to make it aware of the time span of use, thus

making it a modification of the LRU as well. Instead of just incrementing the

counters of pages referenced, (putting equal emphasis on page references

regardless of the time) the reference counter on a page is first shifted right

(divided by 2), before adding the referenced bit to the left of that binary number.

For instance, if a page has referenced bits 1,0,0,1,1,0 in the past 6 clock ticks, its

referenced counter will look like this: 10000000, 01000000, 00100000, 10010000,

11001000, 01100100. Page references closer to the present time have more

impact than page references long ago.

This ensures that pages referenced more recently, though less frequently

referenced, will have higher priority over pages more frequently referenced in the

past. Thus, when a page needs to be swapped out, the page with the lowest

counter will be chosen. This is explained with the help of Figure 4.8.

42

Figure 4.8. Aging Page Replacement Algorithm Illust rated

Figure 4.8 represents a page table with six entries. Working from right to

left, the state of each of the pages (only the counter entries) at each of the six

clock ticks are shown. Consider the (a) column. After clock tick zero the R flags

for the six pages are set to 1, 0, 1, 0, 1 and 1. This indicates that pages 0, 2, 4

and 5 were referenced. This results in the counters being set as shown. It is

assumed they all started at zero so that the shift right, in effect, did nothing and

the reference bit was added to the leftmost bit. At the clock tick in (b), the

algorithm can be followed and extended similarly for (c) to (e) clicks. When a

page fault occurs, the counter with the lowest value is removed. It is obvious that

a page that has not been referenced for, say, four clocks ticks will have four

zeroes in the leftmost positions and will have a lower value that a page that has

not been referenced for three clock ticks. Hardware support to this will be

dedicated comparators to determine the smallest age value. This is replaced with

43

an inexact comparator. Performance comparison is made in terms of the ratio of

page faults.

It can be observed that aging differs from LRU in the sense that aging can

only keep track of the references in the latest 16/32 (depending on the bit size of

the processor's integers) time intervals. Consequently, two pages may have

referenced counters of 00000000, even though one page was referenced 9

intervals ago and the other 1000 intervals ago. Generally speaking, knowing the

usage within the past 16 intervals is sufficient for making a good decision as to

which page to swap out. Thus, aging can offer near-optimal performance for a

moderate price.

The decision circuit used in such a scheme, as mentioned above, is a

series of comparators [73][74] which can be replaced with inexact versions,

leading to a decrease in power dissipation. The performance of the algorithm,

with the exact and inexact comparators is compared and contrasted, in the

results section.

44

Chapter 5

Applications under the Tolerable Impact Category

This chapter deals with applications which can tolerate errors in decision

circuits used in them. Both the applications are image processing algorithms. The

first is a Median Filter based blurring technique which involves a majority voter in

its implementation. The second is a Frequency based image blurring technique

which involves the use of a comparator, followed by a Discrete Cosine Transform

(DCT).

5.1 Majority voter based Rank order Median Filter

Median filters are typically used in image processing systems, like blurring.

Blurring an image, by itself, has various applications ranging from noise filtering

to improving compression ratios. In this technique, a window of pixels

surrounding every pixel is taken as a list and sorted. Then the middle value is

taken as the median and is assigned to the same location in the output image.

A straightforward hardware implementation of median filter is an extremely

complex design since it involves sorting. So generally order statistic [75-78]

methods are used to determine the mid value as such. It involves voting on each

bit positions of the list of data. The rank order filter [75] works by first voting on

the MSB. Then the vote bit is checked with the MSB data bits. Those data whose

45

MSB is different from the vote bit have their left bits changed to their MSB. This is

done on all the bit positions. This requires a majority voter for selecting the vote

decision. The detailed algorithm is as follows:

Algorithm 3: Rank Order Median Filter Algorithm _

1. repeat for all bit positions

1.1. Count 1s and 0s in the current bit position in all pixel data

1.2. if No. of 1's > No. of 0's

1.2.1. set variable vote=1

1.3. else

1.3.1. set variable vote=1

1.4. repeat for all pixel data

1.4.1. if bit in current bit position not equal to vote

1.4.1.1. change remaining bits in pixel to current bit

1.4.2. else

1.4.2.1. leave bits unchanged

1.5. end loop at 1.4

2. end loop at 1

_

46

A Hamming comparator based architecture was presented in [79]. The Hamming

comparator in this circuit is replaced by an inexact voter and comparator, and the

performance is compared with the exact version. The performance metric is

explained later.

Figure 5.1. Rank-order filter algorithm (median det ection, for n=5)

The operation of the algorithm of [75] is illustrated in Fig. 5.1, which shows

five competing values (1, 2, 5, 7, and 14), coded with 4 bits each. In this example,

k=3, so the median must be detected. The value of k defines the threshold t of

the filter, that is, t=k (so t=3 in this example). Since this is a bit-serial circuit, 4

steps are required, one for each bit of the competing values. The operation of the

filter is straightforward: it verifies the number of 1’s among the input bits,

producing y=1 at the output if the number of 1’s is greater than the number of 0's,

or y=0 otherwise; now, if y=0, then the remaining bits of each input word whose

bit presently applied to the filter is 1 are set to 1, while y=1 causes the remaining

bits of each input word whose bit presently applied to the filter is 0 to be set to 0.

This can be evidenced in the last row of the second column, and the first 2 rows

of the 3rd column in Figure 5.1.

47

In the example of Figure 5.1, when the MSB’s (vertical box in the leftmost

stack) are presented to the filter, y=0 is produced (because there is only one 1),

and so all bits of the bottom vector are set to 1 (indicated with a horizontal box in

the next stack). When the next bit (vertical box in the second stack) is presented,

y=1 is obtained at the output (there are three 1’s now), so the remaining bits of

the top two rows are set to 0(horizontal boxes in the next stack). When the third

bit is presented, y=0 is produced (there are two 1’s), and the remaining bits of the

bottom two rows are set to 1 (though the last row had already been set to 1).

Finally, y=1 is produced when the last bit is presented to the filter (there are

again three 1’s). The result is, therefore, y=0101 (decimal 5), which is indeed the

median.

The diagram of the generic rank order filter as prescribed in [79] is

presented in Figure 5.2. The Hamming Comparator (HC) used here is a Majority

Voter, preceded by ‘n’ logic units (LUs). Additionally, an optional D-type flip-flop

can be used to store the output bit (yout <= y). There are ‘n’ digital input words,

denoted by x1, x2, ..., xn. The clock and reset signals are denoted by CLK and

RST, respectively. The inputs of the HC (d1, d2,... ,dn) are provided by the LUs,

to which the output of the HC (y) is fed back. Notice also in Figure 5.2 that the

input words are presented to the filter serially, starting with the MSB.

48

Figure 5.2. Complete diagram of the bit-serial gene ric rank filter (LU is in Figure 5.3)

Figure 5.3. Diagram of the logic unit (LU).

The Majority Voter here was designed in an inexact manner, as with the

bus coding technique. The voter required here is a 9-bit voter. Multiple inexact

variants were designed with varying levels of inexactness. Since this scenario

involves modification of data compared to the exact operation, a system level

error metric was defined. This metric is a perceptive blur metric to determine the

annoyance a blurred image can induce in a subject [95]. This metric takes value

49

from 0 to 1 which stands for best and worst respectively in terms of blur

perception. This is discussed in detail in the results section, along with the results

of the analysis.

5.2 Frequency based Image blurring

Image blurring is also done in frequency domain [80] in addition to spatial.

In this, the frequency content of the image is determined and only the low

frequency components are taken for generating the output image. This is done in

accordance with the nature of the frequency matrix of an image [81]. The two

dimensional frequency matrix, has it’s axes as frequency components in the x

and y axes of the image. The 2 frequency axes, which are increasing from left to

right and top to bottom, index the magnitude and phase of the frequency

component in the matrix. The origin point, the (0,0) point, is the point which has

no changes in x or y axis in the spatial image and represents the DC component

of the image ( average of the entire image). It is also known that with increasing

frequencies, the magnitude falls off such that the point diagonal to the origin will

have very little amplitude. This point will represent the highest frequency content

in the image and preserves fine details. The in-between frequencies contribute to

varying levels of detail. Since blurring is about smoothing out fine detail and

having a smoother gradient, these high frequency components have to be

removed for achieving the effect.

50

This blurring requires having a threshold and selecting the frequency

components based on that. It needs a comparator [82]. This comparator was

designed in an inexact fashion and applied to the blurring process. Multiple

comparators with varying levels of inexactness were designed. As before the

perceptive blur metric was used to compare the outputs of both the exact and

inexact systems. The exact comparator and one of the inexact comparators are

shown in Figures 5.4 and 5.5. Complete performance analysis is presented in the

results section.

Figure 5.4. Exact Comparator

Figure 5.5. Inexact Comparator

Key: E – Both input terms are equal Differently shaded boxes are the changes made

51

Chapter 6

Design Methodology of Inexact Circuits

In this chapter, various design methodologies are presented to build

inexact circuits. Firstly, a manual K-Map based approach is presented. This can

be used at the discretion of the designer, and only for smaller circuits. For the

purpose of this thesis, a heuristic framework is presented to generate inexact

circuits with varying levels of inexactness. In the end, other processing

techniques are presented that can be utilized to obtain inexact circuits. The

image processing approach is of particular interest.

6.1 K-Map based Approach

Initially, a K-Map based approach was adopted for designing inexact

versions of smaller circuits. The following guidelines are presented to aid in such

a process. A collection of minterms which can be grouped in order to facilitate

Boolean minimization is called a grouping. Every minterm (or maxterm) can be

represented by the number of groupings it is part of, and a normalized weight

term depending on the circuit. From the above parameters, a decision can be

made on addition or removal of minterms, as follows:

52

a) As long as the minterm does not destructively affect the circuit complexity, it

can be removed. The removal of a minterm is destructive, if it reduces the size of

a grouping or splits an existing grouping of terms into smaller groupings.

b) A new minterm can be added if it aids in adding additional groups, or in

increasing the size of an existing group.

c) On the basis of weight, a minterm can be removed if its normalized weight is

less than a certain threshold. This threshold has to be set by the system designer.

d) A simpler approach can be followed, where 'filling the holes' or 'trimming the

edges' appropriately in the K-Map can aid in deriving a smaller logic function.

This is because plugging holes most often increases groupings, and removing

corner terms can lead to a more efficient grouping.

The major drawback of the method prescribed above is that the onus is on

the system designer to manually determine weight thresholds, and to ascertain

whether terms can be added or removed from the logic function. However, an

experienced designer can make a few educated guesses, as to which terms to

modify. But this is not the expected procedure to design more complex circuits.

This can give way to an image processing based approach to creating an inexact

K-Map by blur-like processes, and to extract the new logic function from this. This

can be minimized and converted to a gate level netlist using any of the better

Boolean minimization methods (ESPRESSO, Quinn-McCluskey etc.) mentioned

in Chapter 2. This is elaborated at the end of this chapter. But since multiple

53

inexact circuits are required, a heuristic design space exploration approach is

pursued.

6.2 Heuristic Framework for Inexact Circuit Generat ion

To generate several inexact versions of a given logic function, with respect

to a set of predetermined optimization parameters, a heuristic framework has

been designed and implemented. The framework is driven by a Genetic

Algorithm (GA), which searches the design space, using the 'survival of the fittest'

criteria, as explained in Chapter 2.

In existing literature for circuit minimization using evolutionary algorithms

[83][84], and others derived from these, like [85][86] make use a 2D layout of

smaller programmable logic functions, with programmable Input for each

row/column. Since our objective is not to minimize the given function/circuit, but

to generate inexact versions of a logic function, the representation of the

chromosome differs. However, the fitness function of the existing algorithms can

be modified to provide a tolerance for functional error. Since an optimized

framework is not required for the current purpose, a simplistic approach was

taken.

54

6.2.1 Chromosome Representation

For the required framework, the chromosome is built from the output

vector of the given logic function. For example a Boolean function with 4 inputs

will have a bitstream of length 16 (24), as its chromosome. The original

chromosome of an exact full adder will be 01101001 for the sum, and 00010111

for the carry. Multiple functions can be minimized independently or together, to

make use of common logic.

6.2.2 Fitness Function

Finding the appropriate fitness function is important since it is responsible

for quantifying the way a chromosome or individual meets the requirements of

the final goal of optimization. This function evaluates an individual taking into

account some constraints for Boolean synthesis that usually are: (1) getting the

appropriate input–output behaviour and (2) the minimum number of logic gates.

Other constraints that can be added are propagation delays or type of logic gates

available. The appropriate input-output behaviour in this work is the inexactness.

The fitness function, for the framework, is derived from the hardware requirement

of the circuit implementation of the generated function in terms of area, and the

inexactness of the function. Fitness is high when the inexactness is low and the

circuit requirement is also low. Fitness reduces with increase in any of the

parameters. So the inverse of the product of the two can be used as the fitness

function. If required the overall system performance can also be included in this

55

regard. The special case of generating the exact circuit is considered during

simulation.

6.2.3 Genetic Operators

The selection operator is responsible for identifying the best individuals of

the population taking into account the exploitation and the exploration [83] of the

design space. This firstly allows the individuals with better fitness to survive and

reproduce more often. Secondly, it can provide the means for searching in more

areas and making it possible to find better results. The Roulette-Wheel selection

rule is used in the framework. Roulette-Wheel selection, also known as Fitness

proportionate selection, is a genetic operator used in genetic algorithms for

selecting potentially useful solutions for recombination. In fitness proportionate

selection, as in all selection methods, the fitness function assigns a fitness value

to possible solutions or chromosomes. This fitness level is used to associate a

probability of selection with each individual chromosome. If fi is the fitness of

individual i in the population, its probability of being selected is the ratio of fi over

the sum of all fi's.

The mutation operator modifies the chromosome randomly in order to

increase the search space. It can change: (1) an operator or variable and (2) a

segment in the chromosome. A variable mutating probability during the execution

of the algorithm (evolvable mutation) is more effective for hardware evolution.

The mutation is generally defined as a percentage of the genes (bits) in a single

genotype (output vector chromosome) which were to be randomly mutated. It is

56

necessary to adjust the mutation rate if the genotype length was too small to

prevent zero mutation. Generally speaking a mutation rate which results in 4 or 5

genes being changed in each genotype is suitable. This results in a mutation

probability of 2% for a chromosome of length 256 (which is the length of the

chromosome for the circuits used in this thesis).

Crossover is generally not preferred in GA based circuit design [6.1], as it

is does not apply well in the scenario where the chromosome has a predefined

hardware architecture, as crossover generally results in widely variant

chromosomes from the parent chromosomes. But in the case of the proposed

framework, crossover may expose newer exploration areas. But keeping with

literature, crossover is not used. To compensate the absence of crossover, an

aggressive approach was taken towards mutation by using an 8% mutation

probability.

6.2.4 The Algorithm

The evolutionary algorithm used to produce all of the evolved circuit

designs in this work is a simple form of (1-λ)ES evolutionary strategy [87], where

λ is usually about 4. Experiments were reported in [88] which indicated the

efficiency of this approach. The algorithm is as follows:

57

Algorithm 4: Genetic Algorithm _

1. Add Exact function vector to the initial population.

2. Complete initial population with mutated versions of exact function.

3. Evaluate the fitness function for each individual in the population.

4. repeat

4.1. Copy fittest individuals into new population, following selection rule.

4.2. Mutate selected individuals, to complete the new population.

4.3. Calculate the fitness for the new population.

4.4. until convergence

_ _

The convergence condition here depends on the fitness function.

For the purpose of this work, a set of inexact versions of a hamming threshold

voter and a comparator were generated using the framework. These inexact

circuits are tested with a varying set of applications, as mentioned in the previous

chapters.

6.3 Other Methods to design Inexact Circuits

Similar to the manual pattern recognition involved in deriving a Boolean

function from a K-Map, pattern manipulation [90] methods can be applied to

automate the process of designing inexact circuits. The manual method of

looking into a K-Map to deduce ideal combinations to group minterms is just a

58

form of grouping to identify clusters. This can be used to guide group

manipulation, to derive inexact circuits, by having a prejudiced notion of how the

outcome should be. This form of influencing methods to favour some desired

outcome introduces inexactness. Manipulation can be done by various

techniques one typically uses to process data. Typical data processing can vary

from simple matrix manipulation to image processing to data mining. One form of

manipulation is discussed in next few paragraphs.

Matrix manipulation can be done by simply assuming the K-Map to be just

a matrix and operating on it. This can introduce inexactness in various ways. One

way could be to check if the number of ‘1’s in a row/column is above some upper

threshold and simply making the entire row/column to be ‘1’. Together with this, a

lower threshold can also be used, i.e. if the number of ‘1’s is less than the lower

threshold, the entire row/column is made as ‘0’.

The K-Map can also be operated upon as an image. One immediate

technique which can be done is that of compression. Image compression

reduces the size the image occupies. This is generally done in a lossy manner

which involves removing some details in an image, like edges that have sharp

transitions and are less susceptible to compression. To get a better compression

in such cases, blurring can be used. Blurring a K-Map removes finer details thus

giving a hazy look, but occupies less space, and results in data that is not just

either ‘1’ or ‘0’. The data elements will now vary between 0 and 1. This can be

followed by thresholding to extract the information in the K-Map. Thresholding is

59

used to generate a binary image out of a grayscale image. This is a process

where the image values are compared with some threshold and the output image

values will be set to either 1 or 0 based on the result. So in K-Map blurring, the K-

Map image, which is originally a binary image, is blurred into a grayscale image.

This grayscale image is converted into a binary image again by thresholding.

Another field of data processing that can be useful is data mining [91].

Data mining is used to extract useful information out of patterns of data. Some

operations which are typically done are clustering, outlier detection etc. In

clustering, data clusters are formed based on some criteria. Algorithms can be

designed which will manipulate to give better clustering of minterms in a K-Map.

This gives a more scalable way of doing K-Map manipulation. Of particular

interest is outlier detection which involves the identification of elements which do

not fit into the pattern. By using outlier detection on K-Map data, those minterms

which do not fit properly into a group can be identified and pruned.

60

Chapter 7

Results For the purpose of analysis, the multiple inexact circuits, of the required

Boolean function, are generated using the heuristic framework presented in the

previous chapter. Individuals of varying inexactness were selected from the

population resulting from 1000 generations, and not from different generations.

These circuits are compared and contrasted in terms of power, and impact on

system performance. It is to be noted that the proposed method of generating

these circuits is not the optimal technique in terms of quality of results. But it

serves as a good tool for design space exploration, and analysis. All the circuits

were implemented in Verilog [92], and synthesized using the Synopsys tool chain.

The technology node utilized is 180nm. The analysis was subject to the following

operating conditions: a clock frequency of 100MHz, supply voltage of 3.3 V, and

a wire capacitance of 3 picofarads (pF). Experimental details pertaining to

specific applications are discussed in the respective sections.

7.1 Bus Coding For the bus coding application, the stimulus for the circuits were taken

from running the SPEC2000 [93] benchmark files, and obtaining a memory trace.

The parameter to be measured is the percentage reduction in transitions on the

61

bus. This is a measure of the power reduction obtained from encoding data on

the bus. This analysis is represented in Figure 7.1 as the variation of percentage

reduction in transitions to the level of inexactness. In a general design scenario, a

number of smaller order circuits are used to build higher order circuits of the

same kind, as evidenced in comparators, adders, decoders, multiplexers.

Similarly, for the circuit under test, higher order voters are built from a number of

smaller order voters. The level of inexactness is with respect to the smaller order

voters, which in this case is the 8-bit voter. A 16-bit voter is built from two 8-bit

voters, and a 32-bit voter is built from four 8-bit voters, and some added circuitry.

Figure 7.1. Percentage reduction in transitions wit h varying inexactness

In Figure7.1 there is some correlation between the level of inexactness

and the percentage transition reduction. The transition reduction decreases as

the level of inexactness increases. But this cannot be generalized as seen in the

case with the bus width as 16. This can be due to the nature of construction of a

16-bit voter from an 8-bit voter. This along with the stimulus maybe the cause for

62

the anomaly mentioned before. But the upward trend indicates that for the

inexact 16-bit voters, the impact of inexactness on the system performance is

positive.

The results of the hardware analysis of the different voters are presented

in Figures 7.2, 7.3, and 7.4. As seen from the graphs, the circuit cost has no

correlation with the level of inexactness. This is not a surprise, as circuit

complexity depends specifically on the pattern of the input vectors which result in

the desired I/O characteristics.

Figure 7.2. Dynamic Power Dissipation varying with inexactness (8-bit)

Figure 7.3. Circuit Area varying with inexactness ( 8-bit)

63

Figure 7.4. Critical Path Delay varying with inexac tness (8-bit)

The inexact voters also provide a speed advantage and an area

advantage, which directly results in lesser leakage power. The power trend is

similar to the area trend as the reduction in power dissipation is obtained due to

the reduction in the circuit size required to implement the logic function. The

overall advantage of using inexact circuits can be gauged by comparing the

actual power reduction that can be obtained in a typical bus, after accounting for

the power dissipation of the required decision making and encoding circuitry.

Figure 7.5. Comparison of overall power saved by th e Bus Invert with inexactness

64

This analysis is carried out in theory, with typical operating conditions. For

an example Intel Core i7 processor [7.3], the pin capacitance is 3 pF, with an

operating voltage of 3.3V. The clock frequency used is 100 MHz. For this

configuration, the overall reduction in power is shown in Figure 7.5. This analysis

is done with the best inexact voter in terms of system accuracy, i.e. highest

transition reduction percentage among all the inexact circuits. The huge reduction

in the overall power dissipated can be easily understood as following the law of

diminishing returns [94].

The proposed inexactness in this application is in effect a sub-optimal

solution. It can even be said to an extent that it is always better to use inexact

circuitry in low power bus coding system since exact systems can never save

more power than they consume. This validates the advantage of inexactness in

the bus coding domain. Other circuit parameters are also significant. The

reduced critical path delay enables a higher operating frequency. Reduced area

has a direct impact on the leakage power, making the inexact circuits more

advantageous as technology is further scaled down. The case of the serial

transition inversion mentioned in Chapter 4 shows similar results, as the

functionality of this, is similar to a pattern based serial bus invert.

7.2 Rank Order based Median Filter A hamming comparator based rank order median filter was analyzed with

the exact and inexact versions of the hamming comparators. The voter used here

is 9-bits wide, as it has to vote on nine neighboring pixel data. The stimulus used

65

was a set of images. The hardware analysis of the different voters is shown in

Figure 7.6 and 7.7. As evidenced in the previous section, the power dissipation,

or any other hardware parameter, does not correlate with the level of inexactness.

Figure 7.6. Dynamic Power Dissipation varying with inexactness (9-bit)

Figure 7.7. Circuit Area varying with inexactness. (9-bit)

The performance metric used is a blur metric proposed in [95]. Blur

annoyance on a picture was quantified by blurring it and comparing the variations

between neighboring pixels before and after the low-pass filtering step.

Consequently, the first step consists in the computation of the intensity variations

66

between neighboring pixels of the input image. On this same image, a low-pass

filter is applied and the variations between the neighboring pixels are computed.

Then, the comparison between these intensity variations allows the evaluation of

the blur annoyance. Thus, a high variation between the original and the blurred

image means that the original image was sharp, whereas a slight variation

between the original and the blurred image means that the original image was

already blurred. This metric takes value from 0 to 1 which stands for best and

worst respectively in terms of blur perception.

Figure 7.8. Blur metric varying with inexactness (M edian Filter)

As shown in Figure 7.8, all the inexact circuits only perform marginally

worse than the exact voter. At the same time, the other extreme can also be seen

from the last data point on both graphs, which represents an inexact voter with an

area comparable to that of the exact version, and a very large inexactness. The

blur metric of the same is higher than that of all the other voters analyzed. This

demonstrates the extreme ends of the spectrum of inexactness.

67

The median filter with the inexact voter shows a small number of artifacts

in filtering. These artifacts happen mainly because the filter saturates due to the

nature of rank order filter i.e. a wrong decision is cascaded across the word

length, which results in the resultant pixel being all 0's or all 1's. So a hybrid

system was designed which checks whether or not, the output of the inexact

system saturates. If it does, the median is recomputed with an exact system. This

makes use of power gating to switch on the exact system only when needed.

This will lead to an increase in area but will still lead to an enormous decrease in

power consumed since the number of such instances will be very less ( < 5%).

The blur metric comparison for the hybrid version is shown in Figure 7.9.

However, if artifacts can be tolerated by the designer/user, a straightforward

inexact voter is sufficient in implementing the median filter.

Figure 7.9. Blur metric varying with inexactness (M edian Filter - Hybrid) A sample blurred image is shown in Figure 7.10. The inexact circuit used

here is represented by the first data point in the above graphs. This shows the

original image, the image blurred with an exact voter, the image blurred with an

inexact voter, and the image blurred with a combination of the exact and inexact

68

voter. As mentioned earlier white (all 0's) artifacts can be seen in Figure 7.10 (c)

which are absent in Figure 7.10 (d).

(a) (b)

(c) (d)

Figure 7.10. Blurred Image Samples (a) Original image (b) Exact Median Rank (c) Inexact voter rank order filtered image (d) Hybrid voter rank order filtered image

7.3 Frequency based Blur Filter The experimental setup for a DCT based blur filter is the same as with the

median filters. The only difference is that the inexactness is introduced in the

comparator that is used to decide the DCT coefficients. A set of images were

69

processed with the exact comparator as well as its inexact variants. The extreme

ends of inexactness were taken to show the nature of the system in question.

The hardware analysis results are shown in Figures 7.11 and 7.12.

Figure 7.11. Circuit Area varying with Inexactness (Comparator)

As seen earlier, the inexactness has no bearing on the hardware

parameters. For example, the area of the inexact circuit represented by data

points 3 (inexactness of 8), and 8 (inexactness of 36), is similar in magnitude, but

vary a lot in the level of inexactness. The power dissipation follows a similar trend,

as explained in the previous sections.

Figure 7.12. Dynamic Power Dissipation varying with Inexactness (Comparator)

70

Figure 7.13. Blur metric varying with inexactness ( DCT based Filter)

The impact of the inexactness on the system accuracy can be seen in

Figure 7.13 in the form of a blur metric comparison. The blur metric was

explained in the earlier section. It can be seen that, except data points 1 and 5

(inexactness 7 and 17 respectively), the other inexact circuits give a better blur

than the exact variant. This necessarily does not mean that the inexact blurring

supersedes the exact one, but that perceptually, the inexact circuits give a blur

that is less annoying than the exact blur while taking a lesser area to operate on.

However, the standard deviation in the blur metric is less than 10%, which is

satisfactory.

7.4 Translation Lookaside Buffer (TLB)

A TLB simulator with the aging replacement policy was implemented for

experimentation in the C++ programming language [96]. This was analyzed with

both the exact and inexact versions of the comparator. As mentioned in Chapter

71

4, the bite length of the age parameter is 16 bits. This is built from the comparator

designed for the DCT based blur as given in section 7.3. SPEC program traces

were used in obtaining the page hit ratio for the exact and inexact comparators.

A uniform page size of 4KB is used throughout this study. The TLB has 64

entries. The analysis is done for 10000 memory requests. Figure 7.14 shows the

percentage of page faults varying with inexactness.

Figure 7.14. Page Fault varying with Inexactness

As seen in the figure, the increase in the number of page faults (max.

1.5%) is negligible compared to the gains provided in terms of the hardware

requirements. This is shown in Figure 7.15. Even if the overall power dissipated

in retrieving the extra pages is larger, the inexactness provides for a lesser area,

and a greater speed of operation. In this case, inexactness may not prove

advantageous when seen in the power domain alone. Even though Figures 7.15

and 7.16 seem to show a trend, this is not always guaranteed as seen from

earlier analysis.

72

Figure 7.15. Circuit Area varying with inexactness (Comparator )

Figure 7.16. Dynamic Power Dissipation varying with inexactness (Comparator)

As evidenced in this chapter, introducing inexactness in the above

application provides enormous gain in hardware, while having negligible or

tolerable effects on system performance.

73

Chapter 8

Conclusions This chapter presents a summary of the work done in building inexact

applications. Following this, a brief discussion on applications incurring

intolerable loss to system accuracy due to inexactness is presented. The chapter

is then concluded with inferences drawn from the results obtained.

8.1 Summary of Work

In this work, several applications were designed in an inexact fashion, with

varying levels of inexactness, and were compared and contrasted with the exact

version of the application. A novel, orthogonal level of circuit optimization in terms

of power, by using inexact logic has been presented. A set of general guidelines

are presented to design inexact circuits, on a per case basis. To simplify the

process, a heuristic framework to generate inexact circuits has been

implemented, with which multiple inexact versions of different decision circuits

were generated and tested in application scenarios having No impact, Tolerable

impact, Significant impact, on system accuracy respectively.

An inexact Hamming threshold voter, and a comparator, have been

designed and tested with the following set of applications. Bus coding, Median

Filter based Image blurring, and Non-Modular Redundancy (NMR), were chosen

74

as applications for the Hamming threshold voter. The inexact comparators were

tested with the LRU replacement algorithm for Translation Lookaside Buffers

(TLB), and a DCT threshold based image blurring process.

A comparison has been made between the various inexact circuits, and

the exact circuit, taking the levels of inexactness, the power dissipation of the

system, and the system performance/accuracy as parameters for comparison.

The results obtained promise a drastic reduction in power dissipation of the

circuit, up to 300%, on introducing inexactness. However, the reduction in power

and the level of inexactness are not correlated. Power reduction in fact occurs

only if the modification made to the logic function tends to reduce the number of

literals in the Boolean minimization. The levels of inexactness however relate

more closely with the system performance. Although critical path delay was not a

parameter for optimization, a gain of up to 30% was observed on this front. The

area, and static power dissipation also reduces significantly.

8.2 Applications affected by Inexactness

This category of applications is qualitatively discussed in this section.

Examples of N-Modular Redundancy (NMR) and Error Correcting Codes (ECC)

are considered in elucidating the impact of inexactness in such applications.

In NMR systems, the circuit which is to be fault tolerant is replicated N

times and the output of all these N circuits is fed to a voter circuit. The voters

75

used are majority voter, plurality voter, median voter etc. In this case, the

decision of the voter will decide whether the fault incurred in the system is

tolerable or not. A special case of this is the popular Tri-Modular-Redundancy

(TMR).

Using an inexact voter in this case may not be justified, as redundancy is

applied to get more reliability. This is especially true in critical applications, as

that of control circuits in space applications, and nuclear reactors. Here extra

effort is put in to build systems at increased cost so that system failure is minimal.

This is because of the cost involved in carrying out such missions. But in the

case of redundancy of sensors, the output of each sensor is subject to process

variation, and may not render the correct output as required by it. Each sensor

may give an output slightly deviated from the exact value. Thus, even usage of

an exact majority voter may not produce the expected output. So, inexact voting

solutions may not hinder correctness of the system output. This, however, is

speculative.

In the case of Error Correcting Codes, or Error detection, the use of

an inexact circuit will incorrectly detect errors, or miss errors that are otherwise

deciphered by the system. This may lead to inaccurate data with regard to error

correction, or unnecessary re-transmission of data. However, as mentioned

earlier, inexact circuits can be used in concurrent error detection [57], where the

inexact circuits can provide a reduced vector coverage for detecting errors.

76

8.3 Inference from Results

In the case of the majority voter for Bus Invert, the inexact versions of the

circuit are necessary to facilitate power reduction, more so in the case of on-chip

buses. The impact on system performance in terms of decreased power

reduction is negligible compared to the gains achieved in terms of system power.

An inexact voter based median rank filter performs credibly for certain levels of

inexactness, but results in a few image artefacts, for which a workaround has

been proposed in the form of a hybrid system. Such a system falls back on an

exact voter, if certain criterion is satisfied. An NMR system performs poorly on

introducing inexactness, as its primary function is to maintain system accuracy.

For the comparator, the Frequency based image blurring process proved

tolerable to the impact of inexactness. The colour distribution of the image was

retained. For the Page replacement algorithm, there was a negligent increase in

the miss rate, but the penalty of incurring the miss may not nullified by the power

reduction of the comparator.

An interesting study would be to analyze the impact of process variation on

such inexact circuits. Since the inexactness is at the functional level, such circuits

are impacted by process variation in the same manner as exact circuits. However,

the errors introduced can be constructive or destructive, in the sense that, the

error maybe in the part of the circuit already yielding a false result, or in the part

of the circuit that is retained from the exact version. This can be alleviated by

increasing the length of the part of the circuit (as in [97]) contributing to the

77

inexact output, so that it is not affected by delay errors caused due to process

variation. Errors caused in the “inexact part” can be neglected as this leads to a

double-negative which eventually results in the correct output, thereby reducing

the level of inexactness without affecting the reduction in power dissipation.

In conclusion, this thesis contends that the inception of inexactness as an

orthogonal layer of optimization can provide significant returns in terms of power

reduction and increase in speed of operation. The penalty incurred on introducing

inexactness is a design trade-off which has to be analyzed in a per case basis.

By incorporating this trade-off into the optimization parameters of the heuristic

framework, the design effort can be reduced. The final circuit generated depends

on the optimization parameters, and the acceptable error tolerance of the system.

Different inexact circuits can be used depending on the system requirements in

terms of accuracy and power.

78

Bibliography [1]. Microsoft Architecture Journal Vol. 18 Theme: Green Computing http://msdn.microsoft.com/en-us/architecture/bb410935.aspx [2]. Martin, T. L., Siewiorek, D. P., Smailagic, A., Bosworth, M., Ettus, M., and Warren, J. 2003. A case study of a system-level approach to power-aware computing. ACM Trans. Embed. Comput. Syst. 2, 3 (Aug. 2003), 255-276. DOI= http://doi.acm.org/10.1145/860176.860178 [3]. Mircea R. Stan, Kevin Skadron, "Guest Editors' Introduction: Power-Aware Computing," IEEE Computer, vol. 36, no. 12, pp. 35-38, Dec. 2003, doi:10.1109/MC.2003.1250876[ [4] Moore, Gordon E. (1965). "Cramming more components onto integrated circuits”/ Electronics Magazine. pp. 4. [5] Semiconductor Industry Association, International Technology Roadmap For Semiconductors. Website: http://www.itrs.net/. [6]The New York Times Technology section April 1, 2008. Light and Cheap, Netbooks Are Poised to Reshape PC Industry. [7] G. Gerosa, A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm High-К Metal-Gate CMOS, Proceedings of ISSCC 2008 [8] http://www.green500.org/lists.php Green 500 list [9] Sushant Sharma, Chung-Hsing Hsu, and Wu-chun Feng, “Making a case for a green 500 list”, 2nd IEEE IPDPS Workshop on High-Performance, Power-Aware Computing, April 2006 [10] Nano-cmos scaling problems and implications. Nano-CMOS Circuit and Physical Design, Ban P. Wong, Anurag Mittal, Yu Cao, and Greg Starr, John Wiley & Sons Inc. [11]. J. M. Rabaey. Digital Integrated Circuits. Prentice Hall, 1996. [12] Li, H., Bhunia, S., Chen, Y., Vijaykumar, T. N., and Roy, K. 2003. Deterministic Clock Gating for Microprocessor Power Reduction. In Proceedings of the 9th international Symposium on High-Performance Computer Architecture (February 08 - 12, 2003). HPCA. IEEE Computer Society, Washington, DC, 113. [13] Anita Lungu, Pradip Bose, Alper Buyuktosunoglu, Daniel J. Sorin: Dynamic power gating with quality guarantees. ISLPED 2009: 377-382 [14] De-Shiuan Chiou, Shih-Hsin Chen, Chingwei Yeh, "Timing driven power gating", Proceedings of the 43rd annual conference on Design automation,ACM Special Interest Group on Design Automation, pp.121 - 124, 2006

79

[15] Perloff, Microeconomics, Theory and Applications with Calculus page 178. Pearson 2008. [16] Kaushik Roy, Sharat Prasad, Low-Power CMOS VLSI Circuit Design. Wiley, Feb 2000 [17] Anantha Chandrakasan, Robert W. Brodersen, Low-Power CMOS Design, Wiley, Feb 1998 [18] Abinesh R., Bharghava R., M.B. Srinivas, "Transition Inversion Based Low Power Data Coding Scheme for Synchronous Serial Communication," isvlsi, pp.103-108, 2009 IEEE Computer Society Annual Symposium on VLSI, 2009 [19] Abinesh R., Bharghava R., Purini, S., and Regeti, G. Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting. In Proceedings of the 2010 23rd international Conference on VLSI Design (January 03 - 07, 2010). VLSID. IEEE Computer Society, Washington, DC, 158-163. [20] Abinesh R., Bharghava R, Suresh Purini, and Govindarajulu Regeti, “Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting”, J. Low Power Electronics (JOLPE) Vol. 6 No 3 , October 2010 [21] Hayes, J.P. (1993), Digital Logic Design, Addison Wesley [22]. W.V. Quine, “A Way to Simplify Truth Functions,”American Mathematical Monthly, 1955, Vol. 62, pp. 627–631. [23]. E.J. McCluskey, “Minimization of Boolean Functions,”Bell Systems Technical Journal, November 1956, Vol. 35,pp. 1417–1444. [24]. M. Karnaugh, "The map method for synthesis of combinational logic circuits," AIEE Trans., Vol. 72, no. 9, September 1953, pp. 593 – 599. [25] Brayton, R. K., Sangiovanni-Vincentelli, A. L., McMullen, C. T., and Hachtel, G. D. 1984 Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers. [26] Lewin, Douglas (1985), Design of Logic Systems, Van Nostrand (UK) [27] Lala, Parag K. (1996), Practical Digital Logic Design and Testing, Prentice Hall. [28]. C.A. Coello Coello, A.D. Christiansen, and A.A. Hernández, “Automated Design of Combinational Logic Circuits using Genetic Algorithms,” Proceedings of the International Conference on Artificial Neural Nets and Genetic Algorithms, ICANNGA'97, Springer Verlag, 1997, pp. 335 – 338. [29]. J.F. Miller, D. Job, and V. K. Vassiley, “Principles in the Evolutionary Design of Digital Circuits,” Genetic Programming and Evolvable Machines, 2000, Vol. 1, No. 3, pp. 259 – 288. [30]. C.A. Coello Coello, E.H. Luna, and A.H. Aguirre, “A Comparative Study of Encodings to Design Combinational Logic Circuits Using Particle Swarm Optimization,”

80

2004 NASA/DoD Conference on Evolvable Hardware, Seattle, Washington, USA, June 24 – 26, 2004, pp. 71 – 78. [31]. V.G. Gudise, and G.K. Venayagamoorthy, "Evolving Digital Circuits Using Particle Swarm," Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, Portland, OR, USA, July 20 – 24, 2003, pp. 468 – 472. [32]. P.W. Moore, and G. K. Venayagamoorthy, "Evolving Digital Circuits Using Hybrid Particle Swarm Optimization and Differential Evolution,” Conference on Neuro-Computing and Evolving Intelligence, Auckland, New Zealand, December 13 – 15, 2004, pp. 71 – 73. [33] Kabanets, Valentine; Cai, Jin-Yi (2000), "Circuit minimization problem", Proc. 32nd Symposium on Theory of Computing, Portland, Oregon, USA, pp. 73–79, doi:10.1145/335305.335314, ECCC TR99-045 . [34] M. Mano, C. Kime. "Logic and Computer Design Fundamentals" (Fourth Edition). Pg 54 [35] Enderton, H.. A Mathematical Introduction to Logic, second edition, 2001, Harcourt Academic Press. [36] Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Publishing Co. 1989 [37] King, D.B.S., Simpson, R.J., Moore, C., and MacDiarmid, I.P.: ‘Digital n-tuple Hamming comparator for weightless systems’, Electron. Lett., 1998, 34, (22), pp. 2103–2104 [38] Pedroni, V.A.: ‘Compact fixed-threshold and two-vector Hamming comparators’, Electron. Lett., 2003, 39, (24), pp. 1705–1706 [39] Piestrak, S.J.: ‘Design of self-testing checkers for unidirectional error detecting codes’, Scientific Papers of the Institute of Technology Cybernetics of Wroclaw University of Technology, No. 92, Series: Monographs No. 24=Oficyna Wydawnictwo Politechniki Wrocklawskiej, Wroclaw, 1995 [40] King, D.B.S., Simpson, R.J., Moore, C., and MacDiarmid, I.P.: ‘Hamming value comparator hierarchies’, Electron. Lett., 1999, 35, (11), pp. 910–911 [41] Asada, K., Komatsu, S., and Ikeda, M.: ‘Associative memory with minimum Hamming distance detector and its application to bus data encoding’. Proc. IEEE Asia-Pacific ASIC Conf. (AP-ASIC’99), 1999 [42] Barral, C., Coron, J.-S., and Naccache, D.: ‘Externalised fingerprint matching’, Lect. Notes Comput. Sci., 2004, 3072, pp. 309–315 [43] Chen, K.: ‘Bit-serial realisations of a class of nonlinear filters based on positive Boolean functions’, IEEE Trans. Circuits Syst., 1989, 36, (6), pp. 785–794

81

[44] Karaman., M., Onural, L., and Atalar, A.: ‘Design and implementation of a general purpose median filter unit in CMOS VLSI’, IEEE J. Solid-State Circuits, 1990, 25, (2), pp. 505–513 [45] Chua-Chin Wang, Ya-Hsin Hsueh, Hsin-Long Wu, and Chih-Feng Wu, “A Fast Dynamic 64-bit Comparator with Small Transistor Count,” VLSI Design, vol. 14, no. 4, pp. 389-395, 2002. [46] Minsu Kim, Joo-Young Kim, Hoi-Jun Yoo; , "A 1.55ns 0.015 mm2 64-bit quad number comparator," VLSI Design, Automation and Test, 2009. VLSI-DAT '09. International Symposium on , vol., no., pp.283–286. [47] Dorit H. Hochbaum, ed. Approximation Algorithms for NP-Hard problems, PWS Publishing Company, 1997. ISBN 0-534-94968-1. Chapter 9: Various Notions of Approximations: Good, Better, Best, and More [48] Vazirani, Vijay V. (2003). Approximation Algorithms. Berlin: Springer. ISBN 3540653678. [49] Krishna V. Palem, Lakshmi N. Chakrapani, Zvi M. Kedem, Lingamneni Avinash, Kirthi Krishna Muntimadugu: Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects. CASES 2009: 1-10 [50] Lakshmi N. Chakrapani, Pinar Korkmaz, Bilge E. S. Akgul, Krishna V. Palem: Probabilistic system-on-a-chip architectures. ACM Trans. Design Autom. Electr. Syst. 12(3): (2007) [51] Bilge E. S. Akgul, Lakshmi N. Chakrapani, Pinar Korkmaz, Krishna V. Palem: Probabilistic CMOS Technology: A Survey and Future Directions. VLSI-SoC 2006: 1-6 [52] D. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, 1992. [53] H. Fuks, “Non-deterministic density classifiation with diffusive probabilistic cellular automata,” Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, vol. 66, 2002. [54] E. Gelenbe, “Random neural networks with negative and positive signals and product form solution,” Neural Computation, vol. 1, no. 4, pp. 502– 511, 1989. [55] Y. Z. Ding and M. O. Rabin, “Hyper-Encryption and everlasting security,” in Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science; Lecture Notes In Computer Science, vol. 2285, 2002, pp. 1–26. [56] D. Marpe, T. Wiegand, and G. J. Sullivan, “The H.264/MPEG4-AVC standard and its fidelity range extensions,” IEEE Communications Magazine, Sept. 2005. [57] Choudhury, M. R. and Mohanram, K. 2008. Approximate logic circuits for low overhead, non-intrusive concurrent error detection. In Proceedings of the Conference on Design, Automation and Test in Europe (Munich, Germany, March 10 - 14, 2008). DATE '08. ACM, New York, NY, 903-908. DOI= http://doi.acm.org/10.1145/1403375.1403593

82

[58]. M. R. Stan, W. P. Burleson. Bus-Invert Coding for Low Power I/O, IEEE Transactions on Very Large Integration Systems, Vol. 3, No. 1, pp. 49-58, March 1995. [59]. L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano. Asymptotic Zero-Transition Activity Encoding for Address Buses in Low-Power Microprocessor-Based Systems, IEEE 7th Great Lakes Symposium on VLSI, Urbana, IL, pp. 77-82, Mar. 1997. [60]. E. Musoll, T. Lang, and J. Cortadella. Working-Zone Encoding f or reducing the energy in microprocessor address buses. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 4, Dec 1998 [61] El-Moursy, M. A. and Friedman, E. G. 2007. Wire shaping of RLC interconnects. Integr. VLSI J. 40, 4 (Jul. 2007), 461-472. [62] El-Moursy, M. A. and Friedman, E. G. 2003. Optimum wire sizing of RLC interconnect with repeaters. In Proceedings of the 13th ACM Great Lakes Symposium on VLSI (Washington, D. C., USA, April 28 - 29, 2003). GLSVLSI '03. ACM, New York, NY, 27-32. [63]. W. Fornaciari, M. Polentarutti, D.Sciuto, and C. Silvano, “Power Optimization of System-Level Address Buses Based on Software Profiling,” CODES, pp. 29-33, 2000. [64] E. Musoll, T. Lang, and J. Cortadella, “ Exploiting the locality of memory references to reduce the address bus energy”, Proceedings of International Symposium on Low Power Electronics and Design, pp. 202-207, Monterey CA, August 1997. [65] Jun Yang, Rajiv Gupta, Chuanjun Zhang. Frequent value encoding for low power data buses. ACM Trans. Design Autom. Electr. Syst. 9(3): 354-384 (2004) [66] C. Su, C. Tsui, and A. Despain. Saving power in the control path of embedded processors, IEEE Design and Test of computers, 11(4):24–30, 1994 [67] Wei-Chung Cheng, Massoud Pedram. Memory Bus Encoding for Low Power: A Tutorial. ISQED 2001: 199-204 [68] Tanenbaum, Andrew S. Modern Operating Systems (Second Edition). New Jersey: Prentice-Hall 2001. [69] Aho, Denning and Ullman, Principles of Optimal Page Replacement, Journal of the ACM, Vol. 18, Issue 1,January 1971, pp 80-93 [70] Elizabeth J. O'Neil and others, The LRU-K page replacement algorithm for database disk buffering, ACM SIGMOD Conference, pp. 297–306, 1993. [71] Song Jiang and Xiaodong Zhang, LIRS: a Low Inter Reference recency Set replacement, SIGMETRICS 2002 [72] Richard W. Carr and John L. Hennessy, "WSCLOCK—a simple and effective algorithm for virtual memory management", 1981

83

[73] Ballapuram, C., Puttaswamy, K., Loh, G. H., and Lee, H. S. 2006. Entropy-based low power data TLB design. In Proceedings of the International Conference on Compilers, Architecture and Synthesis For Embedded Systems (2006), pp. 304-311. [74] Rhodehamel, Michael W. "The Bus Interface and Paging Units of the i860(tm) Microprocessor". In Proc. IEEE International Conference on Computer Design, p. 380-384, 1989. [75] B. K. Kar, D. K. Pradhan, “A new algorithm for order statistic and sorting,” IEEE Transactions on Signal Processing, Vol. 41, no. 8, pp. 2688-2699, Aug. 1993. [76] K. Oflazer, “Design and implementation of a single-chip 1-D median filter,” IEEE Transactions on Acoustics,Speech, Signal Processing, vol. 31, no. 5, pp. 1164-1168, Oct. 1983. [77] J. P. Fitch, E. J. Coyle, N. C. Gallagher, “Median filtering by threshold decomposition,” IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 32, no. 6, pp.1183-1188, Dec. 1984. [78] L. W. Chang, S. S. Yu, “A new implementation of generalized order statistic filter by threshold decomposition,” IEEE Transactions on Signal Processing, Vol. 40, no. 12, pp. 3062-3066, Dec. 1992. [79] Volnei A. Pedroni: Compact Hamming-Comparator-based rank order filter for digital VLSI and FPGA implementations. ISCAS (2) 2004: 585-588 [80] IEEE Trans. Circuits and Systems Special Issue on Digital Filtering and Image Processing, Vol. CAS-2, 1975. [81] R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 4. [82] R. Hamming Digital Filters, Prentice-Hall, 1983. [83] Miller, J. F., Job, D., and Vassilev, V. K. 2000. Principles in the Evolutionary Design of Digital Circuits—Part I. Genetic Programming and Evolvable Machines 1, 1-2 (Apr. 2000), 7-35. [84] Miller, J. F., Job, D., and Vassilev, V. K. 2000. Principles in the Evolutionary Design of Digital Circuits—Part II. Genetic Programming and Evolvable Machines 1, 3 (Jul. 2000), 259-288. [85] Miller, J. F. and Harding, S. L. 2008. Cartesian genetic programming. In Proceedings of the 2008 GECCO Conference Companion on Genetic and Evolutionary Computation (Atlanta, GA, USA, July 12 - 16, 2008). M. Keijzer, Ed. GECCO '08. ACM, New York, NY, 2701-2726. [86] Xu, H., Ding, Y., and Hu, Z. 2009. Adaptive immune genetic algorithm for logic circuit design. In Proceedings of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation (Shanghai, China, June 12 - 14, 2009), 639-644.

84

[87] T. Back, F. Hoffmeister, and H.-P. Schwefel, ‘‘A survey of evolutionary strategies,’’ in Proceedings of the 4th International Conference on Genetic Algorithms, 1991, pp. 2-9. [88] Miller J. F. An empirical study of the efficiency of learning Boolean functions using a Cartesian Genetic Programming Approach. Proceedings of the 1st Genetic and Evolutionary Computation Conference (GECCO'99). pp. 1135--1142. [89] Krohling R, Zhou Y, Tyrrell A, “Evolving FPGA-based robot controllers using an evolutionary algorithm,” In Proc. Intl. conf. on Artificial Immune Systems, 2002, pp 41–46 [90] Bhagat, Phiroz Pattern Recognition in Industry, Elsevier, ISBN 0-08-044538-1. [91] Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining (2005), ISBN 0-321-32136-7 [92] Thomas, Donald, Moorby, Phillip "The Verilog Hardware Description Language" Kluwer Academic Publishers, Norwell, MA. [93] http://www.spec2000.com/ [94] Samuelson & Nordhaus, Microeconomics, 17th ed. page 110. McGraw Hill 2001. [95] Frederique Crete, Thierry Dolmiere, Patricia Ladret, and Marina Nicolas, “The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Proceedings of SPIE 6492, 64920I (2007) [96] Stroustrup, Bjarne (1997). The C++ Programming Language (Third ed.), Addison-Wesley

[97] Banerjee, N., Karakonstantis, G., and Roy, K., “Process variation tolerant low power DCT architecture,” Proceedings of the Conference on Design, Automation and Test in Europe (2007), pp. 630-635. [98] Latif-Shabgahi, G.; Bass, J.M.; Bennett, S.; , "Efficient implementation of inexact majority and median voters," Electronics Letters , vol.36, no.15, pp.1326-1328, 20 Jul 2000 [99] Sahni, S. and Gonzales, T., “P-complete problems and approximate solutions,”. In Proceedings of the 15th Annual Symposium on Switching and Automata theory (1974) [100] Hromkovič, J. 2001 Algorithmics for Hard Problems: Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics. Springer-Verlag New York, Inc. [101] I. B. Gurevich, Yu. I. Zhuravlev , "Minimization of boolean functions and effective recognition algorithms," Cybernetics and Systems Analysis (1974), vol. 10, no. 3, pp. 393-397 [102] Oppenheim, A. V., Schafer, R. W., and Buck, J. R. 1999 Discrete-Time Signal Processing (2nd Ed.). Prentice-Hall, Inc.

Design of Low Power Applications using Inexact Logic...

Documents

Transcript of Design of Low Power Applications using Inexact Logic...