Design of Low Power Applications using Inexact Logic...
Transcript of Design of Low Power Applications using Inexact Logic...
i
Design of Low Power Applications using Inexact Logic Circuits
By
Bharghava R
200741005
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
Master of Science (by Research) in
VLSI & Embedded Systems
Centre for VLSI & Embedded Systems Technologies International Institute of Information Technology
Hyderabad, India May 2010
ii
Copyright © 2010 Bharghava R All Rights Reserved
iii
Dedicated to my parents, Uma and Rajaram, without whom, I would not be…
iv
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY Hyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled “Design of Low Power
Applications using Inexact Logic Circuits” by Bharghava R (200741005) submitted in
partial fulfilment for the award of the degree of Master of Science (by Research) in VLSI
& Embedded Systems, has been carried out under our supervision and it is not submitted
elsewhere for a degree.
__________ _____________ Date Advisor: Dr. Suresh Purini Asst. Professor IIIT, Hyderabad __________ _____________ Date Advisor:
Prof. Govindarajulu Professor IIIT, Hyderabad
v
Acknowledgements
I owe my deepest gratitude to my advisors, Dr. Suresh Purini and Professor
Govindarajulu whose constant encouragement, guidance and support enabled
me to accomplish this work.
I also thank Prof. M. Satyam for his feedback on various aspects of my work. I
am exceptionally thankful to Abinesh for his valuable help and feedback on my
work. I would like to thank Avi Dullu, and Mukund Ramakrishna for their
contribution to this thesis. I would like to thank all my friends in CVEST lab for the
terrific company during my study.
Finally, I want to thank my family for their unconditional love. Their constant
encouragement, and their faith in me, has always given me the strength to try to
achieve more and to be a better person.
vi
Abstract
Ever since the induction of Integrated circuits into mainstream usage,
numerous research efforts have been made into optimizing the circuits
implemented on silicon with respect to Power, Area, Time (PAT), and most
recently reliability. Due to the recent insurgence of mobile devices, low power
design techniques are being given more emphasis. Power dissipated can be
reduced at the device, circuit, system architecture, or software design levels.
The focus of this work is aimed towards the reduction of power dissipation
in electronic systems, by way of using inexact logic in implementing systems,
where error can be tolerated, and/or neglected. An inexact logic circuit is
constructed by selectively converting minterms/maxterms of its Boolean function
into don’t cares. The intuition in doing this is that by converting only a small
fraction of minterms/maxterms into don’t-cares, an inexact version of the Boolean
function can be synthesized with a significantly lower area-power-delay footprint
than the exact Boolean function.
Decision making circuits, which are generally sensitive to errors, have
been chosen as the subject of analysis. Several applications are presented
where inexactness can be applied, and their performance, in terms of power and
vii
system accuracy is quantified with varying levels of inexactness. This thesis also
outlines a set of general guidelines to design inexact circuits, on a per case basis.
To simplify the process, a heuristic framework to generate inexact circuits with
varying levels of inexactness has been implemented.
The above mentioned applications are classified into 3 categories based
on the impact of the decision made, on system accuracy. These categories are:
No impact, Tolerable impact, and Significant impact, on system accuracy. For
applications with no impact, the only deciding criteria in introducing inexactness
are the overall improvement in power and speed. For applications under the
tolerable impact category, a system accuracy parameter should also be
considered in validating an inexact circuit. For applications with significant impact,
inexactness is seldom tolerated, as system accuracy is critical.
Bus coding, Median Filter based Image blurring, and Non-Modular
Redundancy (NMR), falling under the no impact, tolerable impact, and significant
impact categories respectively share the majority voter as a common decision
circuit used. A Least Recently Used (LRU)-variant replacement algorithm for
Translation Lookaside Buffers (TLB), and a DCT threshold based image blurring
process, with the former having no impact on system accuracy and the latter
having tolerable impact, share a comparator as a decision circuit. Different
viii
inexact versions of these decision circuits are generated and their performance
parameters with respect to the applications are measured.
A comparison has been made with the levels of inexactness and the
power dissipation of the system and the system performance. The results
obtained promise a drastic reduction in power dissipation, up to 300%, with
tolerable deviation in system accuracy. Although critical path delay was not a
parameter for optimization, a gain of up to 30% was observed on this front. The
chip area and static power dissipation is also reduced significantly.
The results obtained validate the use of inexact logic in applications which
are either impervious or tolerant to the errors produced due to the inexactness.
This provides a new avenue to low power system design, which can be used in
congruence to other circuit design, and system level design techniques.
ix
Contents Contents ..................................................................................................... ix
List of Tables .............................................................................................. xi
List of Figures ........................................................................................... xii
List of Relevant Publications ................................................................... xiii
Chapter 1 ......................................................................................................1
Introduction ......................................................................................................... 1 1.1 Power Dissipation in Integrated Circuits [1.11] ............................................. 2 1.2 Low Power Systems ........................................................................................ 3 1.2.1 System Architecture Level .............................................................................. 5 1.2.2 Circuit Level ................................................................................................... 6 1.3 Proposed Technique – Inexact Circuit Design ................................................ 7 1.4 Organization of the Thesis .............................................................................. 9
Chapter 2 .................................................................................................... 11
For Basic Understanding .................................................................................. 11 2.1 Digital Electronics .............................................................................................. 11 2.2 Circuit Minimization ........................................................................................... 12 2.3 Truth Table .......................................................................................................... 14 2.4 Karnaugh Map .................................................................................................... 15 2.4 Genetic Algorithms ............................................................................................. 19 2.6 Hamming Threshold Voter ................................................................................. 21 2.7 Digital Comparator ............................................................................................. 22
Chapter 3 .................................................................................................... 24
Inexact Circuit Design....................................................................................... 24 3.1 Related Work ...................................................................................................... 24 3.2 Concept of Inexactness ....................................................................................... 28
Chapter 4 .................................................................................................... 32
Applications under the No Impact Category .................................................. 32 4.1 Bus coding .......................................................................................................... 32 4.2 Translation Lookaside Buffer (TLB) .................................................................. 40
x
Chapter 5 .................................................................................................... 44
Applications under the Tolerable Impact Category ...................................... 44 5.1 Majority voter based Rank order Median Filter ................................................. 44 5.1 Frequency based Image blurring ......................................................................... 49
Chapter 6 .................................................................................................... 51
Design Methodology of Inexact Circuits ......................................................... 51 6.1 K-Map based Approach ...................................................................................... 51 6.2 Heuristic Framework for Inexact Circuit Generation ......................................... 53 6.3 Other Methods to design Inexact Circuits .......................................................... 57
Chapter 7 .................................................................................................... 60
Results ................................................................................................................. 60 7.1 Bus Coding .......................................................................................................... 60 7.2 Rank Order based Median Filter ......................................................................... 64 7.3 Frequency based Blur Filter ................................................................................ 68 7.4 Translation Lookaside Buffer (TLB) .................................................................. 70
Chapter 8 .................................................................................................... 73
Conclusions ........................................................................................................ 73 8.1 Summary of Work ............................................................................................... 73 8.2 Applications affected by Inexactness .................................................................. 74 8.3 Inference from Results ........................................................................................ 76
Bibliography ............................................................................................... 78
xi
List of Tables Table 2.1 Bus Invert Decision Making ............................................................................. 34 Table 2.2 Bus Invert Decision Encoder ............................................................................ 35
xii
List of Figures Figure 2.1 Example of Circuit Minimization………………………………………….....13 Figure 2.2 Truth Table & K-Map of a Full Adder……………………………………….14 Figure 2.3 A four variable minterm Karnaugh map……………………………….……..16 Figure 2.4 4 set Venn diagram with numbers (0-15) and set names (A-D)……………...18 Figure 3.1 (a) Exact K-Map (b) Inexact K-Map……………………………….………29 Figure 4.1 Bus Invert Block Diagram……………………………………………...….…33 Figure 4.2 K-Map of an Exact Majority Voter……………………………………….….36 Figure 4.3 K-Map of an Inexact Majority Voter……………………………………...…36 Figure 4.4 8-bit Majority Voter Circuit……………………………………………….…37 Figure 4.5 3-out-of-4 Block……………………………………………………………...37 Figure 4.6 8-bit Inexact Majority Voter Circuit……………………………………….…37 Figure 4.7 High Level Architecture for the Transition Inversion scheme…………….…39 Figure 4.8 Aging Page Replacement Algorithm Illustrated………....…………………...42 Figure 5.1 Rank-order filter algorithm (median detection, for n=5)...……………….......46 Figure 5.2 Complete diagram of the bit-serial generic rank filter...…………………..…48 Figure 5.3 Diagram of the logic unit (LU)………………………………………….....…48 Figure 5.4 Exact Comparator……...…………………….……………………...…..……50 Figure 5.5 Inexact Comparator…...…………………………………………………...…50 Figure 7.1 Percentage reduction in transitions with varying inexactness...………..…….61 Figure 7.2 Dynamic Power Dissipation varying with inexactness (8-bit)…………...…..62 Figure 7.3 Circuit Area varying with inexactness (8-bit)…………………………..……62 Figure 7.4 Critical Path Delay varying with inexactness (8-bit)……………………...….63 Figure 7.5 Comparison of overall power saved by the Bus Invert with inexactness ..…..63 Figure 7.6 Dynamic Power Dissipation varying with inexactness (9-bit)……….………65 Figure 7.7 Circuit Area varying with inexactness. (9-bit) ………………………….…...65 Figure 7.8 Blur metric varying with inexactness (Median Filter) …………………..…...66 Figure 7.9 Blur metric varying with inexactness (Median Filter - Hybrid)………..…….67 Figure 7.10 Blurred Image Samples………………………………………………….….68 Figure 7.11 Circuit Area varying with Inexactness (Comparator)…………………….....69 Figure 7.12 Dynamic Power Dissipation varying with Inexactness (Comparator)……...69 Figure 7.13 Blur metric varying with inexactness (DCT based Filter)………………..…70 Figure 7.14 Page Fault varying with Inexactness………………………………………..71 Figure 7.15 Circuit Area varying with inexactness (Comparator)…………………….…72 Figure 7.16 Dynamic Power Dissipation varying with inexactness (Comparator)…..…..72
xiii
List of Relevant Publications
• Bharghava R., Abinesh R., Suresh Purini, Govindarajulu Regeti, “Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting”, Selected for publication in special issue of Journal of Low Power Electronics, to appear in October 2010.
• Bharghava R., Abinesh R., Suresh Purini, Govindarajulu Regeti, “Inexact
Decision Circuits: An Application to Hamming Weight Threshold Voting”, 23rd International Conference on VLSI Design, January 2010
• Abinesh R., Bharghava R., M.B. Srinivas, “Transition Inversion Based Low
Power Data Coding Scheme for Synchronous Serial Communication”, ISVLSI, pp.103-108, 2009 IEEE Computer Society Annual Symposium on VLSI, 2009
• Joint Winner of Intel Research Challenge (also known as Intel Scholar
Program) 2008-2009 http://www.intel.com/cd/corporate/education/APAC/ENG/in/news/news43/419015.htm
1
Chapter 1
Introduction
The field of electronics has undergone several transformations in order to
cater to the needs of human society. Several innovations in device technology,
circuit design methodologies, and architectural perspective, have resulted in
systems with increasing performance over the years. Critical parameters
considered during the design of electronic systems, are the speed of operation
(in terms of operating frequency), the power dissipated by the system, the area
occupied by the circuit in silicon, and reliability of the system. Earlier research
interest was in designing high speed systems, but this has given way to the
design of highly energy efficient systems [1-3] in the last half decade. Of late, the
advent of mobile devices, and portable computing platforms, increasing threat of
energy shortage, and e-junk in the form of batteries, and other toxic substances
used in manufacturing, has increased the emphasis on Low Power design.
Battery driven devices have made power consumption a significant parameter to
be incorporated into system design rather than adding power management
features.
The increased interest shown in low power systems is also a direct result of
the infiltration of Green Computing into mainstream electronic devices. Green
Computing or Green IT refers to environmentally sustainable computing or IT. It
is "the study and practice of designing, manufacturing, using, and disposing of
2
computers, servers, and associated subsystems—such as monitors, printers,
storage devices, and networking and communications systems—efficiently and
effectively with minimal or no impact on the environment.” Unlike in a broader
sense of energy conservation, in the context of VLSI, this involves the design of
systems consuming lesser power than their present counterparts.
1.1 Power Dissipation in Integrated Circuits
An integrated circuit chip contains many capacitive loads, formed both
intentionally (as is the case with gate to channel capacitance) and unintentionally
(between any conductors that are near each other but not electrically connected).
Changing the state of the circuit causes a change in the voltage across these
parasitic capacitances, which involves a change in the amount of stored energy.
As the capacitive loads are charged and discharged through resistive devices, an
amount of energy comparable to that stored in the capacitor is dissipated as heat,
as shown in the following expression
Where Estored is the Energy stored in the capacitor in the form of electrostatic
charge. C is the capacitance formed as mentioned above, and V is the voltage of
operation. The power dissipated is given by the energy dissipated in unit time,
which adds the frequency, and the switching factor to the right hand side of the
equation. The switching factor is the average number of state changes made at
3
that capacitance node per cycle. This multiplied by the frequency gives the
average switching per time unit.
The result of heat dissipation on state change is to limit the amount of
computation that may be performed on a given power budget. While device
shrinkage can reduce some of the parasitic capacitances, the number of devices
on an integrated circuit chip has increased more than enough to compensate for
reduced capacitance in each individual device.
1.2 Low Power Systems
Design optimization is done at all the design levels involved. These design
levels are generally stacked in 4 distinct layers: device, circuit, system, software.
Optimization of power and/or area often leads to degradation of speed, and vice
verse. The inverse relation between power and speed results in a design trade-
off, often leading to applications being classified into low power or high speed
scenarios. Almost all research in electronics is governed by, or rather guided or
motivated by, Moore's Law [4], which describes a long-term trend in the history of
computing hardware, according to which the number of transistors that can be
placed inexpensively on an integrated circuit has doubled approximately every
two years. In Moore's own words - “The complexity for minimum component
costs has increased at a rate of roughly a factor of two per year... Certainly over
the short term this rate can be expected to continue, if not to increase. Over the
4
longer term, the rate of increase is a bit more uncertain, although there is no
reason to believe it will not remain nearly constant for at least 10 years. That
means by 1975, the number of components per integrated circuit for minimum
cost will be 65,000. I believe that such a large circuit can be built on a single
wafer.”
While it is generally accepted that this exponential improvement trend will end,
it is unclear exactly how dense and fast integrated circuits will get by the time this
point is reached. Working devices have been demonstrated that were fabricated
with a MOSFET transistor channel length of 6.3 nanometres using conventional
semiconductor materials, and devices have been built that used carbon
nanotubes as MOSFET gates, giving a channel length of approximately one
nanometre. The density and computing power of integrated circuits are limited
primarily by power dissipation concerns.
Another obstacle to this trend [5], one of which (especially for mobile devices)
is that, the battery technology has been following a slower trend. Recent
improvements in battery and process technology have been aimed at meeting
the increased energy demands of the up-and-coming portable systems. However,
it may be years before next-generation battery technologies, such as fuel cells,
become commercially viable. Hence research is being made at all levels of a
system stack to enable reduction in power consumption. Designers today
continue to be challenged with the need to manage power, timing and signal
integrity concurrently throughout the design flow. Traditional power optimization
5
techniques and today's power-aware design flows are proving insufficient in the
design of systems-on-a-chip (SoCs) for next-generation applications, and must
evolve to enable design for energy efficiency.
Device level optimization involves the design of newer devices with better
scaling, and lesser device dimensions. But this may not be for long, since, a lot of
limiting factors are coming into picture as the transistor feature size is reduced
[10]. Optimization at circuit level involves the design of more efficient circuits to
implement the required logic. There are multiple circuit design techniques in
practice [16-17], which provide the desired power-delay characteristics. This also
involves implementation of newer logic offering more functionality [11]. Examples
of this include the design of universal BCD/Binary adders, bidirectional shift
registers, etc. System level optimization involves the architecture of the system,
at the micro and macro levels. Finally, if the system is programmable, the
software running on the system also plays a vital role in performance. There has
been considerable research in this front yielding many standard techniques that,
when adopted, provide better performance results. Optimization techniques at
the system, and circuit level, which are of immediate concern, have been
elaborated in the following paragraphs.
1.2.1 System Architecture Level
Work done at this level concentrates on system components as a whole.
This is generally done at the processor level, or as part of the operating system
6
governing the processor, if any. In the absence of an operating system the
system software is worked upon. A recent segment of computers, as opposed to
servers, desktops, and laptops, has become quite popular: the Netbook [6].
Netbooks are a branch of sub-notebooks, a rapidly evolving category of small,
lightweight, and inexpensive laptop computers suited for general computing and
accessing Web-based applications; they are often marketed as "companion
devices", i.e., to augment a user's primary computer access. These devices were
built around low power processors which shed some desirable architectural
features (e.g. VIA Nano, ARM, Intel Atom [7]). The performance per watt
(MHz/Watt or MIPS/Watt) has been accepted as a metric of comparison. This is
exemplified in the new list of top 500 environmentally efficient supercomputers
[8][9] , in addition to the top 500 supercomputer list, which are ordered in terms of
their efficiency in energy consumption rather than sheer performance.
1.2.2 Circuit Level
In another direction of research, power management protocols have been
developed incorporating various features that take into account circuit related
innovations. Some of the important features are voltage and frequency scaling
[11]. In these techniques, the clock frequency and core voltage of the processor
are changed depending on the required/expected performance of the processor.
This approach is limited by thermal noise within the circuit. There is a
characteristic voltage proportional to the device temperature and to the
7
Boltzmann constant, which the state switching voltage must exceed in order for
the circuit to be resistant to noise. This is typically on the order of 50–100 mV, for
devices rated to 100 degrees Celsius external temperature (about 4 kT, where T
is the device's internal temperature in Kelvin and k is the Boltzmann constant).
Other popular techniques involve clock gating [12] and power gating [13] [14].
As per the Law of diminishing returns, any small improvement needed in the
output generally requires a large change in input once a high performance state
is achieved [15]. This is the case with system design in general where design is
initially done with a higher performance in mind, and a few parameters are
tweaked to obtain lesser power dissipation. A better approach is to incorporate
low power features during the design phase. There are numerous circuit design
techniques like Transistor Sizing, Logic optimization, Activity Driven Power Down,
low-swing logic, adiabatic switching [16][17].
1.3 Proposed Technique – Inexact Circuit Design
Of late, few unconventional methods of optimization have evolved, one of
which is the use of probabilistic logic [49]. This involves driving different parts of
the system with different source voltages with the presumption that the error
arising out of the Low-VDD operation can be tolerated. This has been
demonstrated for arithmetic circuits in areas of multimedia processing. Also the
error in this methodology is not under the designer's or user's control, but is
governed by a probability related to the device characteristics, and operating
8
conditions. This probability can however be theoretically modelled, in simpler
cases, or can be characterized empirically.
In this work, an orthogonal level of design optimization is proposed to
reduce the power dissipation in a system, by compromising on the veracity of the
system output. The term orthogonal is used in the sense that any other traditional
methodology can be applied in congruence with the proposed technique. By
introducing a predetermined error, or inexactness, in the functionality of the
components of the system, the amount of hardware required to implement these
components can be reduced. This error can be introduced at the algorithm level,
logic design level, circuit level, or at the device level.
This work focuses on obtaining lesser power dissipation using inexactness
at the logic layer, by analyzing the effect of using an inexact logic function on the
power and critical delay of the circuit in question, and the impact of inexactness
on the system performance. Here system performance is defined as the extent to
which the system with an inexact function can approximate the original system
function. Power reduction is achieved by replacing the required circuit function,
with another function which provides a similar to, but not necessarily a subset or
superset of, the input-to-output relation. The functional difference between the
alternate circuit and the original circuit, or, the level of inexactness, can vary
depending on the application. The field of work is further narrowed by considering
decision circuits in applications that are not affected by, or can tolerate,
9
inaccuracy. Low power applications are designed using these inexact logic
circuits.
The idea was constructed as part of a circuit design required for a serial
data coding scheme [18] for low power transmission. For feasibility of the coding
scheme, an inexact version of the decision circuit was designed to facilitate
power reduction, using hamming threshold comparator as an example [19].
Further applications supporting inexactness and a heuristic design framework
have also been presented [20]. Further optimization at circuit and system level
can follow this procedure. The concept proposed is explained in later sections,
followed by a design methodology.
For the purpose of this work the circuits used are an inexact Hamming
threshold voter, and a comparator. Applications are designed using these inexact
logic circuits, and are analyzed with varying levels of inexactness. These
applications are categorized according to their tolerance to the error introduced
due to the inexactness. A comparison between the levels of inexactness and the
power dissipation of the system and the system performance is presented in the
results section. The terms inexact circuit and inexact logic is used
interchangeably in this thesis.
1.4 Organization of the Thesis
The first chapter, so far, introduced the importance of low power design in
today's perspective, and provided a glimpse at the proposed technique in
10
addressing the issue of power dissipation. The rest of the thesis is organized as
follows:
• Chapter 2 provides the basic information required to aid in understanding
the concept and the work process.
• Chapter 3 explains the concept of inexactness, while also providing an
insight into the effect of circuit inexactness on the system.
• The 4th chapter covers a set of applications that are not affected by inexact
circuits, as the inexactness does not fall in the datapath of the system, or
can be rectified through the inexact decision.
• The 5th chapter covers application scenarios where error introduced by
inexactness can be tolerated.
• Chapter 6 presents manual and automated design methodologies to build
inexact circuits.
• Chapter 7 elaborates on the power performance results of the designed
inexact circuits, and the impact of inexactness on system accuracy.
• Chapter 8 concludes the thesis by summarizing the results, and inferring
the advantages of inexact logic in low power system design.
11
Chapter 2
For Basic Understanding
This section gives a basic understanding of the concepts involved, to give
the reader a better perspective.
2.1 Digital Electronics
Digital circuits [21] are electronic circuits based on a number of discrete
voltage levels. They are the most common physical representation of Boolean
algebra and are the basis of all digital computers. The terms "digital circuit",
"digital system" and "logic" are interchangeable in the context of digital circuits.
Most digital circuits use two voltage levels labelled "Low" and "High". Often "Low"
will be near zero volts and "High" will be at a higher level nearer to the supply
voltage in use. The fundamental advantage of digital techniques stem from the
fact it is easier to get an electronic device to switch into one of a number of
known states than to accurately reproduce a continuous range of values.
Computers, digital signal processors, programmable logic controllers
(used to control industrial processes), cell phones, audio players, are examples
of applications constructed around digital circuits. Digital electronics are usually
made from large assemblies of logic gates, which are simple electronic
representations of Boolean logic functions. A Boolean function describes how to
12
determine a Boolean value output based on some logical calculation from
Boolean inputs. Such functions play a basic role in questions of complexity theory
as well as the design of circuits and chips for digital computers. Engineers use
many methods to minimize logic functions, in order to reduce the circuit's
complexity. When the complexity is less, the circuit also has fewer errors and
less electronics, and is therefore less expensive. This also results in lesser
utilization of power during operation. Historically, binary decision diagrams,
Quine–McCluskey [22][23] algorithm (automated), truth tables, Karnaugh Maps
[24], and Boolean algebra have been used, to aid in the process of simplification.
The most widely used practical simplification is a minimization algorithm like the
Espresso [25] heuristic logic minimizer within a CAD system. Other heuristic
techniques like genetic algorithms, and swarm intelligence are also used [28-32].
2.2 Circuit Minimization
In Boolean algebra, circuit minimization is the problem of obtaining the
smallest logic circuit (Boolean formula) that represents a given Boolean function
or truth table. The general circuit minimization problem is believed to be
intractable [33][34] but there are effective heuristics such as Karnaugh maps[24]
and the Quine–McCluskey algorithm [22][23] that facilitate the process. The
problem with having a complicated circuit (i.e. one with many elements, such as
logical gates) is that each element takes up physical space in its implementation
and costs time and money to produce in itself.
13
While there are many ways to minimize a circuit [33], this is an example
that minimizes (or simplifies) a Boolean function. Note that the Boolean function
carried out by the circuit in Figure 2.1 is used to compute the expression given by
(A’ and B) or (A and B’). It is evident that two negations, two conjunctions, and a
disjunction are used in this statement. This means that to build the circuit one
would need two inverters, two AND gates, and an OR gate.
Figure 2.1. Example of Circuit Minimization
We can simplify (minimize) the circuit by applying logical identities or
using intuition. Since the example states that A is true when B is false or the
other way around, we can conclude that this simply means A is not equal to B. In
terms of logical gates, inequality simply means an XOR gate (exclusive or).
Therefore, the two circuits shown in the figure are equivalent.
14
2.3 Truth Table
A truth table is a mathematical table used in logic—specifically in
connection with Boolean algebra, and Boolean function—to compute the
functional values of logical expressions on each of their functional arguments,
that is, on each combination of values taken by their logical variables [35].
Practically, a truth table is composed of one column for each input variable
(for example, A and B), and one final column for all of the possible results of the
logical operation that the table is meant to represent (for example, A OR B). Each
row of the truth table therefore contains one possible configuration of the input
variables (for instance, A=true B=false), and the result of the operation for those
values. A full adder’s truth table is shown in Figure 2.2, along with its K-Map
minimization. K-Map minimization is elaborated in the next section.
Figure 2.2. Truth Table & K-Map of a Full Adder
15
Truth tables are a simple and straightforward way to encode Boolean
functions. However, given the exponential growth in size as the number of inputs
increase, they are not suitable for functions with a large number of inputs. Other
representations which are more memory efficient are textual equations and
binary decision diagrams. In digital electronics, truth tables can be used to
reduce basic Boolean operations to simple correlations of inputs to outputs,
without the use of logic gates or code.
2.4 Karnaugh Map
The Karnaugh map [24] (K-map for short), Maurice Karnaugh's 1953
refinement of Edward Veitch's 1952 Veitch diagram, is a method to simplify
Boolean algebra expressions. The Karnaugh map reduces the need for extensive
calculations by taking advantage of humans' pattern-recognition capability,
permitting the rapid identification and elimination of potential race conditions.
In a Karnaugh map the Boolean variables are transferred (generally from a
truth table) and ordered according to the principles of Gray code in which only
one variable changes in value in between squares along rows/columns. Once the
table is generated and the output possibilities are transcribed, the data is
arranged into the largest possible groups containing 2n cells (n=0,1,2,3...) and the
minterm is generated through the axiom laws of Boolean algebra.
16
The size of the Karnaugh map with 'n' Boolean variables is determined by 2n. The
size of the group within a Karnaugh map with ‘n’ Boolean variables and 'k'
number of terms in the resulting Boolean expression is determined by 2nk. A
generic 4 variable K-Map is shown in Figure 2.3.
Figure 2.3. A four variable minterm Karnaugh map
Normally, extensive calculations are required to obtain the minimal
expression of a Boolean function; however Karnaugh mapping reduces the need
for such calculations by:
• Taking advantage of the human brain's pattern-matching capability to
decide which terms should be combined to obtain the simplest expression.
• Permitting the rapid identification and elimination of potential race hazards,
which is unavoidable in Boolean equations.
• Providing an excellent aid for simplification of up to six variables, however
with more variables it becomes more difficult to discern optimal patterns.
17
• Helping to teach about Boolean functions and minimization.
• For problems involving more than six variables, solving the Boolean
expressions is preferred over the use of a Karnaugh mapping.
Karnaugh maps generally become more cluttered and hard to interpret
when adding more variables. A general rule is that Karnaugh maps work well for
up to four variables, and shouldn't be used at all for more than six variables. For
expressions with larger numbers of variables, the Quinn–McCluskey algorithm
can be used.
When the Karnaugh map has been completed, to derive a minimized
function the "1s" or desired outputs are grouped into the largest possible
rectangular groups in which the number of grid boxes (output possibilities) in the
groups must be equal to a power of 2. For example, the groups may be 4 boxes
in a line, 2 boxes high by 4 boxes long, 2 boxes by 2 boxes, and so on. "Don't
care(s)" possibilities (generally represented by an "X") are grouped only if the
group created is larger than the group with "Don't care" is excluded. The boxes
can be used more than once only if it generates the least number of groups.
Each "1" or desired output possibilities must be contained within at least one
grouping.
18
Figure 2.4. 4 set Venn diagram with numbers (0-15) and set names (A-D)
The groups generated are converted to a Boolean expression by: locating
and transcribing the variable possibility attributed to the box, and by the axiom
laws of Boolean algebra—in which if the (initial) variable possibility and its
inverse are contained within the same group the variable term is removed. Each
group provides a "product" to create a "sum-of-products" in the Boolean
expression. To determine the inverse of the Karnaugh map, the "0s" are grouped
instead of the "1s". The two expressions are non-complementary.
Each square in a Karnaugh map corresponds to a minterm (and maxterm).
The picture in Figure 2.4 shows the location of each minterm on the map. A Venn
diagram of four sets—labeled A, B, C, and D—is shown to the right that
corresponds to the 4-variable K-map of minterms just above it:
19
• Variable A of the K-map corresponds to set A in the Venn diagram; etc.
• Minterm m0 of the K-map corresponds to area 0 in the Venn diagram; etc.
• Minterm m9 is ABCD (or 1001) in the K-map corresponds only to where
sets A & D intersect in the Venn diagram.
Thus, a specific minterm identifies a unique intersection of all four sets.
The Venn diagram can include an infinite number of sets and still correspond to
the respective Karnaugh maps. With increasing number of sets and variables,
both Venn diagram and Karnaugh map increase in complexity to draw and
manage. The grid is toroidally connected, so the rectangular groups can wrap
around edges. For example m9 can be grouped with m1; just as m0, m8, m2,
and m10 can be combined into a four-by-four group.
2.4 Genetic Algorithms
Genetic algorithms [36] are implemented in a computer simulation in which
a population of abstract representations (called chromosomes or the genotype of
the genome) of candidate solutions (called individuals, creatures, or phenotypes)
is mapped to an optimization problem evolves toward better solutions.
Traditionally, solutions are represented in binary as strings of 0s and 1s, but
other encodings are also possible. The evolution usually starts from a population
of randomly generated individuals and happens in generations. In each
20
generation, the 'fitness' of every individual in the population is evaluated, multiple
individuals are stochastically selected from the current population (based on their
fitness), and modified (recombined and possibly randomly mutated) to form a
new population. The new population is then used in the next iteration of the
algorithm. Commonly, the algorithm terminates when either a maximum number
of generations has been produced, or a satisfactory fitness level has been
reached for the population. The algorithm is shown below.
Algorithm 1: Basic Genetic Algorithm _
1. Generate an initial population.
2. Calculate the fitness function for each individual.
3. repeat
3.1. Select two parents from individuals of last generation for crossover.
3.2. Cross individuals with a probability.
3.3. Mutate both parents with a probability.
3.4. Calculate the fitness for the mutated individuals.
3.5. Insert the mutated individuals in the new generation.
4. until convergence
_ _
If the algorithm has terminated due to a maximum number of generations,
a satisfactory solution may or may not have been reached. The basic algorithm of
any evolutionary optimization technique is given as follows. Genetic algorithms
21
have been used in circuit minimization as shown in [28][29]. Most of the genetic
algorithm based circuit design uses a Cartesian Genetic Programming array,
where the genotype considered is in terms of the input output relation and
interconnection among a matrix of programmable logic functions.
2.6 Hamming Threshold Voter
Let H(X) denote the Hamming weight of an n-bit binary vector X = {x1, x2, .
. . , xn}, i.e. the number of 1’s in it. Here we consider circuits that compare H(X) to
a fixed threshold 'k'. The output of the voter is a single bit indicating whether the
number of 1's is above the threshold or not. Some Hamming weight comparators
were proposed recently in [2.17][2.18]. They can also be designed using two n-bit
counters of 1’s and a comparator of K-bit integers, where K is the logarithm of the
bit length; for the most complete survey of counters of 1’s see [39].
The numerous applications of such comparators include digital neural
networks [40], pattern matching and data compression [41][42] and median and
rank order filters [43][44]. Since a 50% threshold is generally used in the
applications considered for analysis, a special case of the Hamming threshold
voter, called the majority voter is used. This is also called a Hamming
Comparator in later chapters.
22
2.7 Digital Comparator
A digital comparator or magnitude comparator is a hardware electronic
device that takes two numbers as input in binary form and determines whether
one number is greater than, less than or equal to the other number. Comparators
are used in a central processing units (CPU) and micro-controllers. Examples of
digital comparator include the CMOS 4063 and 4585 and the TTL 7485 and
74682-'89. Consider two 4-bit binary numbers A and B such that A = A3A2A1A0
and B = B3B2B1B0. Here each subscript represents one of the digits in the
numbers.
The binary numbers A and B will be equal if all the pairs of significant
digits of both numbers are equal, i.e., A3 = B3, A2 = B2, A1 = B1 and A0 = B0. Since
the numbers are binary, the digits are either 0 or 1 and the boolean function for
equality of any two digits Ai and Bi can be expressed as:
xi is 1 only if Ai and Bi are equal. For the equality of A and B, all xi
variables (for i=0,1,2,3) must be 1. So the equality condition of A and B can be
implemented using the AND operation as (A = B) = x3x2x1x0. The binary variable
(A=B) is 1 only if all pairs of digits of the two numbers are equal.
In order to manually determine the greater of two binary numbers, we
inspect pairs of similar weighted bits, starting from the most significant bit,
23
gradually proceeding towards lower significant bits until an inequality is found.
When an inequality is found, if the corresponding bit of A is 1 and that of B is 0
then we conclude that A>B.
This sequential comparison can be expressed logically as:
(A>B) and (A < B) are output binary variables, which are equal to 1 when A>B or
A<B respectively. Often, it is required only to know the greater or the lesser of
two values. Higher order comparators are generally built with a series of smaller
comparators [45][46].
24
Chapter 3
Inexact Circuit Design
This chapter elaborates on the proposed concept of inexactness, and
discusses some related work in this field of work. As such, there has been no
investigation into the usage of inexact circuits in a variety of application scenarios.
3.1 Related Work
Inexact/Approximate systems are not a completely new concept.
Approximate or inexact solutions are found in abundance in the fields of
computer science and operations research. Here, approximation algorithms are
algorithms used to find approximate solutions to optimization problems.
Algorithms have been approximated before to solve NP-Hard problems [47][48].
Approximation algorithms are often associated with NP-hard problems; since it is
unlikely that there can ever be efficient polynomial time exact algorithms solving
NP-hard problems, one settles for polynomial time sub-optimal solutions.
Unlike heuristics, which usually only find reasonably good solutions
reasonably fast, one wants provable solution quality and provable run time
bounds. Ideally, the approximation is optimal up to a small constant factor (for
instance within 5% of the optimal solution). Approximation algorithms are
25
increasingly being used for problems where exact polynomial-time algorithms are
known but are too expensive due to the input size. A typical example for an
approximation algorithm is the one for vertex cover in graphs, which involves
finding an uncovered edge and adding both endpoints to the vertex cover, until
none remain. It is clear that the resulting cover is at most twice as large as the
optimal one. This is a constant factor approximation algorithm with a factor of 2.
Approximation is also carried out for non NP-hard problems, like in [98],
where the mean and median of a set of numbers is computed in an approximate
fashion to reduce the time complexity. Other approximate solutions of this kind
are used in [99-101], where applications vary from string matching, pattern
recognition, optimization problems, etc.
Approximating computation is common in digital filters [102] where the
data obtained on filtering is either rounded off or truncated to the nearest value.
Truncation is the process of dropping the last few Least Significant Bits (LSBs),
so that the final result can fit in the given hardware register provided. Rounding
off is the process of approximating the data value to the nearest decimal value.
For e.g. 3.78 can be rounded off to 3.8, and 6.11 can be rounded off to 6.1. Here
the rounding off is done for the second digit after the decimal point, around the
first digit. Also, the coefficients chosen for these filters are approximated.
Inexactness has been generally avoided in circuit design until recent times.
This was because technology scaling and other techniques could provide the
required power budget. As scaling becomes stagnant, newer unorthodox
26
techniques of achieving design goals must be sought. There have been efforts in
alleviating the effect of process variation on VLSI circuits. Architectures have
been proposed [97] for certain applications where computation paths which
contribute less to the final result are made longer so that under process variation,
delay errors in these paths do not affect the final outcome significantly.
A recent development in low power system design was the advent of
probabilistic logic [49-51]. The authors, here, propose that arithmetic circuits can
be built to operate with error in applications, which benefit from (or harness)
probabilistic behavior at the device level, or applications that can tolerate
probabilistic behavior at the device level. Probabilistic logic is achieved in the
form of noise during circuit operation. Circuits operated at a voltage closer to the
CMOS threshold voltage for that technology, tend to be affected more by noise.
Applications that can tolerate or utilize this noise can be designed using this
technique. In the former case, the examples of Bayesian Inference [52],
Probabilistic Cellular Automata [53], Random Neural Networks [54], and Hyper
Encryption [55]. In the domain of applications that tolerate probabilistic behavior
they investigate applications which can trade energy and performance for
application-level quality of the solution.
Applications in the domain of digital signal processing were chosen, where
application-level quality of solution is naturally expressed in the form of signal-to-
noise ratio or SNR. In this context, the adders used in the filters in the H.264
decoding algorithm [56] were implemented in probabilistic logic by scaling the
27
voltage across the bit-length, with the voltage reducing towards the LSB. This
technique is elaborated to avoid confusion with the proposed inexactness.
The proposed inexactness is at logic level, and is predetermined and fixed
at the design stage. The inexactness is not a result of the operating conditions.
Where as, in probabilistic logic, the error in the circuit arises due to the voltage
scaling. In probabilistic logic, Circuit partitioning for proper voltage scaling, and
analysis is difficult. Also, the task of providing multiple voltage sources is a
design burden, because of which the number of voltages is generally restricted to
two.
An existing method to design approximate logic circuits is proposed in [57],
where certain 0-minterms are assumed as don’t-cares to form 1-approximate or
0-approximate circuits. The resultant circuit is a subset of the original circuit,
catering to a subset of the input vectors. The purpose of introducing this
approximation in the function is to reduce the area overhead in performing
concurrent error detection. The system accuracy is not compromised, as this
approximate circuit exists in conjunction with the original circuit. In the proposed
work, however, as the system accuracy is compromised, the system designer
has to be provided with a way to vary the inexactness of the circuit, so as to
achieve a satisfactory design trade off. This can be done with the heuristic
framework proposed later on. Also, comparison of the proposed framework with
this algorithm is redundant, as the solution obtained through the algorithm is part
of the design space being explored by the heuristic framework. It is of importance
28
to note the difference between introducing inexactness, and the use of don’t-
cares [34] in digital logic design. Don’t-cares are part of the system and are
included in the system specification. In digital logic, a don't-care term is an input-
sequence/vector to a Boolean function that the designer does not care about,
usually because that input would never happen, or because differences in that
input would not result in any changes to the output. By considering these don't-
care inputs, designers can potentially minimize their function much more so than
if the don't-care inputs were taken to have an output of all 0 or all 1. Examples of
don't-care terms are the binary values 1010 through 1111 (10 through 15 in
decimal) for a function that takes a BCD value, because a BCD value never takes
on values from 1010 to 1111. This is different from deliberately introducing an
error in the function by inverting existing minterms or maxterms. The concept of
inexactness is elaborated in the following section.
3.2 Concept of Inexactness
Design of Inexact circuits is the process of approximating the function of
the logic circuits to be optimized. To elaborate, assume the logic function to be
implemented has a certain set of minterms M. The final circuit implemented will
be a set of minterms M', similar to M, which need neither be a complete subset
nor a superset of M, nor even closely resemble it. The desirable characteristic for
M' is such that the circuit resulting from it requires much less power to operate (or
less delay), and the error arising out of it can be tolerated by the system.
Traditional design techniques assume exact operation of a circuit, according to its
29
specifications. In case of digital system design, the truth table of the system has
to be fully applicable to the circuit designed. Instead of designing an exact
system, certain inexactness can be introduced if it does not lead to unacceptable
performance. This can be either in terms of system accuracy, or human
perception.
Figure 3.1. (a) Exact K-Map (b) Inexact K-Map
For example, consider the function represented in the K-Map shown in
Figure 3.1a. The single ‘1’ that represents the term ab`c`d` involves multiple
gates for its implementation. If it can be ascertained, that removing this‘1’does
not lead to degradation of system performance, the function can be implemented
in an inexact manner, as given in Figure 3.1b. Also the single ‘1’ that corresponds
to a`bcd` needs more gates as shown in Figure 3.1b. If an extra ‘1’ can be added
at a`b`cd` the hardware required to implement the function is reduced.
The error in the system depends on the number of input vectors whose
output is altered, and the total number of input vectors. In the example function
considered, the inexact versions induce a 12.5% error (2 vectors in 16) in the
system assuming all vectors occur equally likely. The circuit designed from this
30
inexact version of the system will consume lesser power, occupy lesser area, and
may also involve a lesser critical path delay, if the terms neglected were originally
the only other term in its level of the critical path. Not all decision circuits can be
designed in an inexact fashion. The extent of inexactness has to be quantified,
and its effect on the system as a whole has to be analyzed extensively before
such a step is taken.
The application scenarios for inexact circuits can be classified into 3 broad
categories depending on the impact of inexactness on the system accuracy. It is
of importance to not that this accuracy is not the error in the circuit itself, but it’s
reflection on the system output. Applications are classified into 3 categories
where inexactness has:-
a) No Impact on system accuracy
b) Tolerable Impact on system accuracy
c) Significant Impact on system accuracy
Applications of the 1st category include branch prediction, bus
coding, cache replacement, coder/decoders, where either, decisions are made
for the purpose of additional optimization, or the decision bits are available to
discern the appropriate operation done. Applications belonging to the 2nd
category are wider in quantity, with a large number of image processing
applications, timer applications, network stacks, etc. Here, the error made can be
tolerated, due to the nature of the application being malleable. The 3rd category
31
of applications, which do not tolerate any error, are generally part of a defined
state machine or are used for the sake of reliability. The later category consists of
error correcting codes, voters for redundancy etc.
Without losing generality applications falling under the first two categories
were chosen for analysis. These applications are chosen such that they share
some circuitry that can be implemented in inexact logic. Applications belonging to
the first category are discussed in the next chapter, and those pertaining to the
second class are discussed in the chapter following the next. Design
methodologies to build inexact logic circuits are presented after that.
32
Chapter 4
Applications under the No Impact Category
In this chapter, two applications where inexactness does not affect the
data accuracy of the system namely, (i) the bus invert application and (ii) a Page
replacement scheme are elaborated. The decision circuit used in the former
application is a Hamming threshold voter, and in the latter, a comparator.
4.1 Bus coding Recent advances in computing uses like graphics, scientific computing
demand data transfer to such high levels that bus interfaces are being constantly
racked up to higher performance points. These applications are highly memory
intensive rather than being just CPU intensive. They need enormous amount of
data to be transferred for computation which has increased bandwidth
requirements of off-chip busses. This in turn entails higher frequencies and
hence higher power consumption.
Reducing this off-chip bus power consumption has become one of the key
issues for low power system design. The fact that the power consumed in bus
accesses account for a significant fraction of the total power consumed in VLSI
(Very Large Scale Integrated Systems) systems has been independently
established by many researchers, [58-60]. Numerous techniques have been
33
proposed in the past in order to reduce the effect of self-capacitance and
coupling-capacitance of buses on the power dissipation of an integrated circuit.
Wire shaping [61], buffer insertion [62], and several bus coding schemes [63-67]
have been used to reduce power dissipation due to coupling-capacitance
between adjacent wires. Mitigating the effect of self-capacitance involves the
reduction of data transitions on the bus. This involves some form of data coding.
Generally, a decision making circuit is required to ascertain whether the data has
to be coded or not. Most techniques like this involve a majority voter, or a similar
circuit.
Figure 4.1. Bus Invert Block Diagram
The first major initiative in bus coding schemes was the Bus-invert coding
scheme [58], represented in Figure 4.1. A number of bus coding techniques that
followed were modifications of the Bus-invert technique. However, Bus Invert
remains one of the very few practically used bus coding schemes, finding
application in Double Data Rate (DDR) Synchronous Dynamic Random Access
Memory (SDRAM), and other bus architectures.
34
Bus invert works by counting the number of transitions, which involves
XORing of the present and previous data. If the number of transitions is more
than half the bus width, the inverted data is transmitted, else the original data is
transmitted. A separate line is also added to the bus which will carry the decision.
The decision bit will signify whether the data that is on the bus is the original data
or it’s complement. The bus invert algorithm is explained below:
Algorithm 2: Bus Invert _
1. Count the transitions between the data on the bus and the next data that is to
be put on bus
2. if transitions count < half of the bus width
2.1. Assign next data to bus
3. else
3.1. Invert the next data and assign the complement to bus
A block diagram of the system is shown in Figure 4.1. A sample decision
making process is shown in Table 4.1.
Table 4.1: Bus Invert Decision Making
Bit No. 1 2 3 4 5 6 7 8
Current Data on bus 1 0 1 0 1 0 1 1
Next Data to be put on Bus 0 1 1 1 0 1 0 1
XOR of present and next data 1 1 0 1 1 1 1 0
35
In the given example the number of transitions is 6, which is more than
half the bus width, 4. So the data is inverted and then sent. The decision is sent
on a separate line. An XOR between the current data and the next data that is
put on the bus shows that the transitions are reduced to 2. This is given by (N-t),
where N is the bus width and t is the original number of transitions. The encoding
process is shown in Table 4.2.
Table 4.2: Bus Invert Decision Encoder Bit No. 1 2 3 4 5 6 7 8
Next Data to be put on Bus 0 1 1 1 0 1 0 1
Next Data that is put on Bus 1 0 0 0 1 0 1 0
Current Data on bus 1 0 1 0 1 0 1 1
XOR of current and next data 0 0 1 0 0 0 0 1
The whole operation involves a chain of full adders to count the transitions
and then perform another XOR on the data that has to be sent. All these
operations have to be done before the next data arrives at the bus. The parallel
XOR array and the chain of full adders contribute to the delay in taking a decision.
Beyond this the encoder delay also has to be taken into account which involves a
parallel XOR to perform controlled inversion. This entire set of operations has to
be over by the time the next data arrives leading to a restriction on bandwidth.
36
Figure 4.2. K-Map of an Exact Majority Voter
Figure 4.3. K-Map of an Inexact Majority Voter
The major bottleneck in implementing such a scheme is the Majority Voter,
which is a special case of a Hamming threshold voter, explained in chapter 2. An
inexact version of the majority voter was designed by using the previously
discussed techniques. For example, using the guidelines mentioned later in
chapter 6, the minterms of an exact voter (Figure 4.2) were manipulated to look
like as in Figure 4.3. The circuits for the same are shown in Figures 4.4, 4.5, and
4.6.
37
Figure 4.4. 8-bit Majority Voter Circuit (FA – Full Adder)
Figure 4.5. 3-out-of-4 Block
Figure 4.6. 8-bit Inexact Majority Voter Circuit
38
The efficiency of the inexact circuit for the case of bus-invert was
compared with the same system using an exact majority voter. The efficiency is
taken in terms of the reduction in transitions. The inexact majority voter will
process certain input vectors in a wrong manner. The results of the different
inexact versions generated are compared with an exact voter as shown and
discussed in the results section.
The inexact voters were also used for a serial bus coding scheme
proposed by the author in [18]. This work outlined a novel Transition Inversion
based data coding protocol by which these transitions on the data line can be
reduced for synchronous serial buses like JTAG, SPI, I2C etc. In serial data
transfer, data is generally loaded onto a buffer in parallel or serial fashion and
then placed on the bus serially. The algorithm first determines the number of
transitions in the data word. If the serial data buffer is loaded in parallel, then a
majority voter circuit is used to count the number of transitions. Serially, the same
process can be done by either, using an XOR gate between consecutive bits and
counting the ‘1’s., or a counter on the line that counts on both the edges.
If the number of transitions is more than half the word length, the
transitions states between the bits can be inverted. In case transition inversion is
needed, the scheme operates by observing the transition states between any 2
bits. Accordingly, the encoded second bit is retained as the previous encoded bit
if there is a transition. If there is no transition, the previous encoded bit is inverted.
39
The decision bit signifying transition inversion is transmitted before transmitting
the encoded data. This results in an overhead for the system.
Figure 4.7. High Level Architecture for the Transit ion Inversion scheme
The decision of the transition inversion is made depending on the count of
the transitions and is stored. The circuit used for this purpose depends on
whether the data is loaded serially, or in a parallel fashion. The bit stream is
encoded if a transition inversion is needed. The bit stream is encoded on the fly
as the data is put on the bus, as shown in Figure 4.7. In the receiver the decoder
has to decode the incoming bit stream and recover the original data. If the serial
buffer is loaded in parallel (Parallel-In-Serial-Out), then the decision circuitry has
to be implemented as combinational logic. This implementation of the transition
counter and decision circuit is built using a majority voter circuit. This is replaced
by an inexact voter, and the performance is compared with regard to the
reduction in transitions. This is presented in the results section.
40
4.2 Translation Lookaside Buffer (TLB)
A TLB [68] is a CPU cache that memory management hardware uses to
improve virtual address translation speed. It was the first cache introduced in
processors. All current desktop and server processors (such as x86) use a TLB.
A TLB has a fixed number of slots that contain page table entries, which map
virtual addresses to physical addresses. The virtual memory is the space seen
from a process and can be greater than the physical memory. This space is
segmented in pages of a predetermined size. Generally only some pages are
loaded in the physical memory in locations depending on the page replacement
policies. In a computer operating system that uses paging for virtual memory
management, page replacement algorithms decide which memory pages to page
out (swap out, write to disk) when a page of memory needs to be allocated.
Paging happens when a page fault occurs and a free page cannot be used to
satisfy the allocation, either because there are none, or the number of free pages
is lower than some threshold.
When the page, that was selected for replacement and swapped out, is
referenced again, it has to be swapped in (read in from disk), and this involves
waiting for I/O completion. This determines the quality of the page replacement
algorithm: the less time waiting for page-ins, the better the algorithm. A page
replacement algorithm looks at the limited information about accesses to the
pages provided by hardware, and tries to guess which pages should be replaced
to minimize the total number of page misses, while balancing this with the costs
41
(primary storage and processor time) of the algorithm itself. Of the several
existent page replacement policies [69-72], the Least Recently Used (LRU)
closely resembles the most optimum performance that can be achieved. But due
to the hardware complexity of its implementation for larger TLBs, it is generally
replaced by a Clock, or Aging algorithm, which coarsely approximates the LRU.
The aging algorithm is a descendant of the Non Frequently Used (NFU)
algorithm, with modifications to make it aware of the time span of use, thus
making it a modification of the LRU as well. Instead of just incrementing the
counters of pages referenced, (putting equal emphasis on page references
regardless of the time) the reference counter on a page is first shifted right
(divided by 2), before adding the referenced bit to the left of that binary number.
For instance, if a page has referenced bits 1,0,0,1,1,0 in the past 6 clock ticks, its
referenced counter will look like this: 10000000, 01000000, 00100000, 10010000,
11001000, 01100100. Page references closer to the present time have more
impact than page references long ago.
This ensures that pages referenced more recently, though less frequently
referenced, will have higher priority over pages more frequently referenced in the
past. Thus, when a page needs to be swapped out, the page with the lowest
counter will be chosen. This is explained with the help of Figure 4.8.
42
Figure 4.8. Aging Page Replacement Algorithm Illust rated
Figure 4.8 represents a page table with six entries. Working from right to
left, the state of each of the pages (only the counter entries) at each of the six
clock ticks are shown. Consider the (a) column. After clock tick zero the R flags
for the six pages are set to 1, 0, 1, 0, 1 and 1. This indicates that pages 0, 2, 4
and 5 were referenced. This results in the counters being set as shown. It is
assumed they all started at zero so that the shift right, in effect, did nothing and
the reference bit was added to the leftmost bit. At the clock tick in (b), the
algorithm can be followed and extended similarly for (c) to (e) clicks. When a
page fault occurs, the counter with the lowest value is removed. It is obvious that
a page that has not been referenced for, say, four clocks ticks will have four
zeroes in the leftmost positions and will have a lower value that a page that has
not been referenced for three clock ticks. Hardware support to this will be
dedicated comparators to determine the smallest age value. This is replaced with
43
an inexact comparator. Performance comparison is made in terms of the ratio of
page faults.
It can be observed that aging differs from LRU in the sense that aging can
only keep track of the references in the latest 16/32 (depending on the bit size of
the processor's integers) time intervals. Consequently, two pages may have
referenced counters of 00000000, even though one page was referenced 9
intervals ago and the other 1000 intervals ago. Generally speaking, knowing the
usage within the past 16 intervals is sufficient for making a good decision as to
which page to swap out. Thus, aging can offer near-optimal performance for a
moderate price.
The decision circuit used in such a scheme, as mentioned above, is a
series of comparators [73][74] which can be replaced with inexact versions,
leading to a decrease in power dissipation. The performance of the algorithm,
with the exact and inexact comparators is compared and contrasted, in the
results section.
44
Chapter 5
Applications under the Tolerable Impact Category
This chapter deals with applications which can tolerate errors in decision
circuits used in them. Both the applications are image processing algorithms. The
first is a Median Filter based blurring technique which involves a majority voter in
its implementation. The second is a Frequency based image blurring technique
which involves the use of a comparator, followed by a Discrete Cosine Transform
(DCT).
5.1 Majority voter based Rank order Median Filter
Median filters are typically used in image processing systems, like blurring.
Blurring an image, by itself, has various applications ranging from noise filtering
to improving compression ratios. In this technique, a window of pixels
surrounding every pixel is taken as a list and sorted. Then the middle value is
taken as the median and is assigned to the same location in the output image.
A straightforward hardware implementation of median filter is an extremely
complex design since it involves sorting. So generally order statistic [75-78]
methods are used to determine the mid value as such. It involves voting on each
bit positions of the list of data. The rank order filter [75] works by first voting on
the MSB. Then the vote bit is checked with the MSB data bits. Those data whose
45
MSB is different from the vote bit have their left bits changed to their MSB. This is
done on all the bit positions. This requires a majority voter for selecting the vote
decision. The detailed algorithm is as follows:
Algorithm 3: Rank Order Median Filter Algorithm _
1. repeat for all bit positions
1.1. Count 1s and 0s in the current bit position in all pixel data
1.2. if No. of 1's > No. of 0's
1.2.1. set variable vote=1
1.3. else
1.3.1. set variable vote=1
1.4. repeat for all pixel data
1.4.1. if bit in current bit position not equal to vote
1.4.1.1. change remaining bits in pixel to current bit
1.4.2. else
1.4.2.1. leave bits unchanged
1.5. end loop at 1.4
2. end loop at 1
_
46
A Hamming comparator based architecture was presented in [79]. The Hamming
comparator in this circuit is replaced by an inexact voter and comparator, and the
performance is compared with the exact version. The performance metric is
explained later.
Figure 5.1. Rank-order filter algorithm (median det ection, for n=5)
The operation of the algorithm of [75] is illustrated in Fig. 5.1, which shows
five competing values (1, 2, 5, 7, and 14), coded with 4 bits each. In this example,
k=3, so the median must be detected. The value of k defines the threshold t of
the filter, that is, t=k (so t=3 in this example). Since this is a bit-serial circuit, 4
steps are required, one for each bit of the competing values. The operation of the
filter is straightforward: it verifies the number of 1’s among the input bits,
producing y=1 at the output if the number of 1’s is greater than the number of 0's,
or y=0 otherwise; now, if y=0, then the remaining bits of each input word whose
bit presently applied to the filter is 1 are set to 1, while y=1 causes the remaining
bits of each input word whose bit presently applied to the filter is 0 to be set to 0.
This can be evidenced in the last row of the second column, and the first 2 rows
of the 3rd column in Figure 5.1.
47
In the example of Figure 5.1, when the MSB’s (vertical box in the leftmost
stack) are presented to the filter, y=0 is produced (because there is only one 1),
and so all bits of the bottom vector are set to 1 (indicated with a horizontal box in
the next stack). When the next bit (vertical box in the second stack) is presented,
y=1 is obtained at the output (there are three 1’s now), so the remaining bits of
the top two rows are set to 0(horizontal boxes in the next stack). When the third
bit is presented, y=0 is produced (there are two 1’s), and the remaining bits of the
bottom two rows are set to 1 (though the last row had already been set to 1).
Finally, y=1 is produced when the last bit is presented to the filter (there are
again three 1’s). The result is, therefore, y=0101 (decimal 5), which is indeed the
median.
The diagram of the generic rank order filter as prescribed in [79] is
presented in Figure 5.2. The Hamming Comparator (HC) used here is a Majority
Voter, preceded by ‘n’ logic units (LUs). Additionally, an optional D-type flip-flop
can be used to store the output bit (yout <= y). There are ‘n’ digital input words,
denoted by x1, x2, ..., xn. The clock and reset signals are denoted by CLK and
RST, respectively. The inputs of the HC (d1, d2,... ,dn) are provided by the LUs,
to which the output of the HC (y) is fed back. Notice also in Figure 5.2 that the
input words are presented to the filter serially, starting with the MSB.
48
Figure 5.2. Complete diagram of the bit-serial gene ric rank filter (LU is in Figure 5.3)
Figure 5.3. Diagram of the logic unit (LU).
The Majority Voter here was designed in an inexact manner, as with the
bus coding technique. The voter required here is a 9-bit voter. Multiple inexact
variants were designed with varying levels of inexactness. Since this scenario
involves modification of data compared to the exact operation, a system level
error metric was defined. This metric is a perceptive blur metric to determine the
annoyance a blurred image can induce in a subject [95]. This metric takes value
49
from 0 to 1 which stands for best and worst respectively in terms of blur
perception. This is discussed in detail in the results section, along with the results
of the analysis.
5.2 Frequency based Image blurring
Image blurring is also done in frequency domain [80] in addition to spatial.
In this, the frequency content of the image is determined and only the low
frequency components are taken for generating the output image. This is done in
accordance with the nature of the frequency matrix of an image [81]. The two
dimensional frequency matrix, has it’s axes as frequency components in the x
and y axes of the image. The 2 frequency axes, which are increasing from left to
right and top to bottom, index the magnitude and phase of the frequency
component in the matrix. The origin point, the (0,0) point, is the point which has
no changes in x or y axis in the spatial image and represents the DC component
of the image ( average of the entire image). It is also known that with increasing
frequencies, the magnitude falls off such that the point diagonal to the origin will
have very little amplitude. This point will represent the highest frequency content
in the image and preserves fine details. The in-between frequencies contribute to
varying levels of detail. Since blurring is about smoothing out fine detail and
having a smoother gradient, these high frequency components have to be
removed for achieving the effect.
50
This blurring requires having a threshold and selecting the frequency
components based on that. It needs a comparator [82]. This comparator was
designed in an inexact fashion and applied to the blurring process. Multiple
comparators with varying levels of inexactness were designed. As before the
perceptive blur metric was used to compare the outputs of both the exact and
inexact systems. The exact comparator and one of the inexact comparators are
shown in Figures 5.4 and 5.5. Complete performance analysis is presented in the
results section.
Figure 5.4. Exact Comparator
Figure 5.5. Inexact Comparator
Key: E – Both input terms are equal Differently shaded boxes are the changes made
51
Chapter 6
Design Methodology of Inexact Circuits
In this chapter, various design methodologies are presented to build
inexact circuits. Firstly, a manual K-Map based approach is presented. This can
be used at the discretion of the designer, and only for smaller circuits. For the
purpose of this thesis, a heuristic framework is presented to generate inexact
circuits with varying levels of inexactness. In the end, other processing
techniques are presented that can be utilized to obtain inexact circuits. The
image processing approach is of particular interest.
6.1 K-Map based Approach
Initially, a K-Map based approach was adopted for designing inexact
versions of smaller circuits. The following guidelines are presented to aid in such
a process. A collection of minterms which can be grouped in order to facilitate
Boolean minimization is called a grouping. Every minterm (or maxterm) can be
represented by the number of groupings it is part of, and a normalized weight
term depending on the circuit. From the above parameters, a decision can be
made on addition or removal of minterms, as follows:
52
a) As long as the minterm does not destructively affect the circuit complexity, it
can be removed. The removal of a minterm is destructive, if it reduces the size of
a grouping or splits an existing grouping of terms into smaller groupings.
b) A new minterm can be added if it aids in adding additional groups, or in
increasing the size of an existing group.
c) On the basis of weight, a minterm can be removed if its normalized weight is
less than a certain threshold. This threshold has to be set by the system designer.
d) A simpler approach can be followed, where 'filling the holes' or 'trimming the
edges' appropriately in the K-Map can aid in deriving a smaller logic function.
This is because plugging holes most often increases groupings, and removing
corner terms can lead to a more efficient grouping.
The major drawback of the method prescribed above is that the onus is on
the system designer to manually determine weight thresholds, and to ascertain
whether terms can be added or removed from the logic function. However, an
experienced designer can make a few educated guesses, as to which terms to
modify. But this is not the expected procedure to design more complex circuits.
This can give way to an image processing based approach to creating an inexact
K-Map by blur-like processes, and to extract the new logic function from this. This
can be minimized and converted to a gate level netlist using any of the better
Boolean minimization methods (ESPRESSO, Quinn-McCluskey etc.) mentioned
in Chapter 2. This is elaborated at the end of this chapter. But since multiple
53
inexact circuits are required, a heuristic design space exploration approach is
pursued.
6.2 Heuristic Framework for Inexact Circuit Generat ion
To generate several inexact versions of a given logic function, with respect
to a set of predetermined optimization parameters, a heuristic framework has
been designed and implemented. The framework is driven by a Genetic
Algorithm (GA), which searches the design space, using the 'survival of the fittest'
criteria, as explained in Chapter 2.
In existing literature for circuit minimization using evolutionary algorithms
[83][84], and others derived from these, like [85][86] make use a 2D layout of
smaller programmable logic functions, with programmable Input for each
row/column. Since our objective is not to minimize the given function/circuit, but
to generate inexact versions of a logic function, the representation of the
chromosome differs. However, the fitness function of the existing algorithms can
be modified to provide a tolerance for functional error. Since an optimized
framework is not required for the current purpose, a simplistic approach was
taken.
54
6.2.1 Chromosome Representation
For the required framework, the chromosome is built from the output
vector of the given logic function. For example a Boolean function with 4 inputs
will have a bitstream of length 16 (24), as its chromosome. The original
chromosome of an exact full adder will be 01101001 for the sum, and 00010111
for the carry. Multiple functions can be minimized independently or together, to
make use of common logic.
6.2.2 Fitness Function
Finding the appropriate fitness function is important since it is responsible
for quantifying the way a chromosome or individual meets the requirements of
the final goal of optimization. This function evaluates an individual taking into
account some constraints for Boolean synthesis that usually are: (1) getting the
appropriate input–output behaviour and (2) the minimum number of logic gates.
Other constraints that can be added are propagation delays or type of logic gates
available. The appropriate input-output behaviour in this work is the inexactness.
The fitness function, for the framework, is derived from the hardware requirement
of the circuit implementation of the generated function in terms of area, and the
inexactness of the function. Fitness is high when the inexactness is low and the
circuit requirement is also low. Fitness reduces with increase in any of the
parameters. So the inverse of the product of the two can be used as the fitness
function. If required the overall system performance can also be included in this
55
regard. The special case of generating the exact circuit is considered during
simulation.
6.2.3 Genetic Operators
The selection operator is responsible for identifying the best individuals of
the population taking into account the exploitation and the exploration [83] of the
design space. This firstly allows the individuals with better fitness to survive and
reproduce more often. Secondly, it can provide the means for searching in more
areas and making it possible to find better results. The Roulette-Wheel selection
rule is used in the framework. Roulette-Wheel selection, also known as Fitness
proportionate selection, is a genetic operator used in genetic algorithms for
selecting potentially useful solutions for recombination. In fitness proportionate
selection, as in all selection methods, the fitness function assigns a fitness value
to possible solutions or chromosomes. This fitness level is used to associate a
probability of selection with each individual chromosome. If fi is the fitness of
individual i in the population, its probability of being selected is the ratio of fi over
the sum of all fi's.
The mutation operator modifies the chromosome randomly in order to
increase the search space. It can change: (1) an operator or variable and (2) a
segment in the chromosome. A variable mutating probability during the execution
of the algorithm (evolvable mutation) is more effective for hardware evolution.
The mutation is generally defined as a percentage of the genes (bits) in a single
genotype (output vector chromosome) which were to be randomly mutated. It is
56
necessary to adjust the mutation rate if the genotype length was too small to
prevent zero mutation. Generally speaking a mutation rate which results in 4 or 5
genes being changed in each genotype is suitable. This results in a mutation
probability of 2% for a chromosome of length 256 (which is the length of the
chromosome for the circuits used in this thesis).
Crossover is generally not preferred in GA based circuit design [6.1], as it
is does not apply well in the scenario where the chromosome has a predefined
hardware architecture, as crossover generally results in widely variant
chromosomes from the parent chromosomes. But in the case of the proposed
framework, crossover may expose newer exploration areas. But keeping with
literature, crossover is not used. To compensate the absence of crossover, an
aggressive approach was taken towards mutation by using an 8% mutation
probability.
6.2.4 The Algorithm
The evolutionary algorithm used to produce all of the evolved circuit
designs in this work is a simple form of (1-λ)ES evolutionary strategy [87], where
λ is usually about 4. Experiments were reported in [88] which indicated the
efficiency of this approach. The algorithm is as follows:
57
Algorithm 4: Genetic Algorithm _
1. Add Exact function vector to the initial population.
2. Complete initial population with mutated versions of exact function.
3. Evaluate the fitness function for each individual in the population.
4. repeat
4.1. Copy fittest individuals into new population, following selection rule.
4.2. Mutate selected individuals, to complete the new population.
4.3. Calculate the fitness for the new population.
4.4. until convergence
_ _
The convergence condition here depends on the fitness function.
For the purpose of this work, a set of inexact versions of a hamming threshold
voter and a comparator were generated using the framework. These inexact
circuits are tested with a varying set of applications, as mentioned in the previous
chapters.
6.3 Other Methods to design Inexact Circuits
Similar to the manual pattern recognition involved in deriving a Boolean
function from a K-Map, pattern manipulation [90] methods can be applied to
automate the process of designing inexact circuits. The manual method of
looking into a K-Map to deduce ideal combinations to group minterms is just a
58
form of grouping to identify clusters. This can be used to guide group
manipulation, to derive inexact circuits, by having a prejudiced notion of how the
outcome should be. This form of influencing methods to favour some desired
outcome introduces inexactness. Manipulation can be done by various
techniques one typically uses to process data. Typical data processing can vary
from simple matrix manipulation to image processing to data mining. One form of
manipulation is discussed in next few paragraphs.
Matrix manipulation can be done by simply assuming the K-Map to be just
a matrix and operating on it. This can introduce inexactness in various ways. One
way could be to check if the number of ‘1’s in a row/column is above some upper
threshold and simply making the entire row/column to be ‘1’. Together with this, a
lower threshold can also be used, i.e. if the number of ‘1’s is less than the lower
threshold, the entire row/column is made as ‘0’.
The K-Map can also be operated upon as an image. One immediate
technique which can be done is that of compression. Image compression
reduces the size the image occupies. This is generally done in a lossy manner
which involves removing some details in an image, like edges that have sharp
transitions and are less susceptible to compression. To get a better compression
in such cases, blurring can be used. Blurring a K-Map removes finer details thus
giving a hazy look, but occupies less space, and results in data that is not just
either ‘1’ or ‘0’. The data elements will now vary between 0 and 1. This can be
followed by thresholding to extract the information in the K-Map. Thresholding is
59
used to generate a binary image out of a grayscale image. This is a process
where the image values are compared with some threshold and the output image
values will be set to either 1 or 0 based on the result. So in K-Map blurring, the K-
Map image, which is originally a binary image, is blurred into a grayscale image.
This grayscale image is converted into a binary image again by thresholding.
Another field of data processing that can be useful is data mining [91].
Data mining is used to extract useful information out of patterns of data. Some
operations which are typically done are clustering, outlier detection etc. In
clustering, data clusters are formed based on some criteria. Algorithms can be
designed which will manipulate to give better clustering of minterms in a K-Map.
This gives a more scalable way of doing K-Map manipulation. Of particular
interest is outlier detection which involves the identification of elements which do
not fit into the pattern. By using outlier detection on K-Map data, those minterms
which do not fit properly into a group can be identified and pruned.
60
Chapter 7
Results For the purpose of analysis, the multiple inexact circuits, of the required
Boolean function, are generated using the heuristic framework presented in the
previous chapter. Individuals of varying inexactness were selected from the
population resulting from 1000 generations, and not from different generations.
These circuits are compared and contrasted in terms of power, and impact on
system performance. It is to be noted that the proposed method of generating
these circuits is not the optimal technique in terms of quality of results. But it
serves as a good tool for design space exploration, and analysis. All the circuits
were implemented in Verilog [92], and synthesized using the Synopsys tool chain.
The technology node utilized is 180nm. The analysis was subject to the following
operating conditions: a clock frequency of 100MHz, supply voltage of 3.3 V, and
a wire capacitance of 3 picofarads (pF). Experimental details pertaining to
specific applications are discussed in the respective sections.
7.1 Bus Coding For the bus coding application, the stimulus for the circuits were taken
from running the SPEC2000 [93] benchmark files, and obtaining a memory trace.
The parameter to be measured is the percentage reduction in transitions on the
61
bus. This is a measure of the power reduction obtained from encoding data on
the bus. This analysis is represented in Figure 7.1 as the variation of percentage
reduction in transitions to the level of inexactness. In a general design scenario, a
number of smaller order circuits are used to build higher order circuits of the
same kind, as evidenced in comparators, adders, decoders, multiplexers.
Similarly, for the circuit under test, higher order voters are built from a number of
smaller order voters. The level of inexactness is with respect to the smaller order
voters, which in this case is the 8-bit voter. A 16-bit voter is built from two 8-bit
voters, and a 32-bit voter is built from four 8-bit voters, and some added circuitry.
Figure 7.1. Percentage reduction in transitions wit h varying inexactness
In Figure7.1 there is some correlation between the level of inexactness
and the percentage transition reduction. The transition reduction decreases as
the level of inexactness increases. But this cannot be generalized as seen in the
case with the bus width as 16. This can be due to the nature of construction of a
16-bit voter from an 8-bit voter. This along with the stimulus maybe the cause for
62
the anomaly mentioned before. But the upward trend indicates that for the
inexact 16-bit voters, the impact of inexactness on the system performance is
positive.
The results of the hardware analysis of the different voters are presented
in Figures 7.2, 7.3, and 7.4. As seen from the graphs, the circuit cost has no
correlation with the level of inexactness. This is not a surprise, as circuit
complexity depends specifically on the pattern of the input vectors which result in
the desired I/O characteristics.
Figure 7.2. Dynamic Power Dissipation varying with inexactness (8-bit)
Figure 7.3. Circuit Area varying with inexactness ( 8-bit)
63
Figure 7.4. Critical Path Delay varying with inexac tness (8-bit)
The inexact voters also provide a speed advantage and an area
advantage, which directly results in lesser leakage power. The power trend is
similar to the area trend as the reduction in power dissipation is obtained due to
the reduction in the circuit size required to implement the logic function. The
overall advantage of using inexact circuits can be gauged by comparing the
actual power reduction that can be obtained in a typical bus, after accounting for
the power dissipation of the required decision making and encoding circuitry.
Figure 7.5. Comparison of overall power saved by th e Bus Invert with inexactness
64
This analysis is carried out in theory, with typical operating conditions. For
an example Intel Core i7 processor [7.3], the pin capacitance is 3 pF, with an
operating voltage of 3.3V. The clock frequency used is 100 MHz. For this
configuration, the overall reduction in power is shown in Figure 7.5. This analysis
is done with the best inexact voter in terms of system accuracy, i.e. highest
transition reduction percentage among all the inexact circuits. The huge reduction
in the overall power dissipated can be easily understood as following the law of
diminishing returns [94].
The proposed inexactness in this application is in effect a sub-optimal
solution. It can even be said to an extent that it is always better to use inexact
circuitry in low power bus coding system since exact systems can never save
more power than they consume. This validates the advantage of inexactness in
the bus coding domain. Other circuit parameters are also significant. The
reduced critical path delay enables a higher operating frequency. Reduced area
has a direct impact on the leakage power, making the inexact circuits more
advantageous as technology is further scaled down. The case of the serial
transition inversion mentioned in Chapter 4 shows similar results, as the
functionality of this, is similar to a pattern based serial bus invert.
7.2 Rank Order based Median Filter A hamming comparator based rank order median filter was analyzed with
the exact and inexact versions of the hamming comparators. The voter used here
is 9-bits wide, as it has to vote on nine neighboring pixel data. The stimulus used
65
was a set of images. The hardware analysis of the different voters is shown in
Figure 7.6 and 7.7. As evidenced in the previous section, the power dissipation,
or any other hardware parameter, does not correlate with the level of inexactness.
Figure 7.6. Dynamic Power Dissipation varying with inexactness (9-bit)
Figure 7.7. Circuit Area varying with inexactness. (9-bit)
The performance metric used is a blur metric proposed in [95]. Blur
annoyance on a picture was quantified by blurring it and comparing the variations
between neighboring pixels before and after the low-pass filtering step.
Consequently, the first step consists in the computation of the intensity variations
66
between neighboring pixels of the input image. On this same image, a low-pass
filter is applied and the variations between the neighboring pixels are computed.
Then, the comparison between these intensity variations allows the evaluation of
the blur annoyance. Thus, a high variation between the original and the blurred
image means that the original image was sharp, whereas a slight variation
between the original and the blurred image means that the original image was
already blurred. This metric takes value from 0 to 1 which stands for best and
worst respectively in terms of blur perception.
Figure 7.8. Blur metric varying with inexactness (M edian Filter)
As shown in Figure 7.8, all the inexact circuits only perform marginally
worse than the exact voter. At the same time, the other extreme can also be seen
from the last data point on both graphs, which represents an inexact voter with an
area comparable to that of the exact version, and a very large inexactness. The
blur metric of the same is higher than that of all the other voters analyzed. This
demonstrates the extreme ends of the spectrum of inexactness.
67
The median filter with the inexact voter shows a small number of artifacts
in filtering. These artifacts happen mainly because the filter saturates due to the
nature of rank order filter i.e. a wrong decision is cascaded across the word
length, which results in the resultant pixel being all 0's or all 1's. So a hybrid
system was designed which checks whether or not, the output of the inexact
system saturates. If it does, the median is recomputed with an exact system. This
makes use of power gating to switch on the exact system only when needed.
This will lead to an increase in area but will still lead to an enormous decrease in
power consumed since the number of such instances will be very less ( < 5%).
The blur metric comparison for the hybrid version is shown in Figure 7.9.
However, if artifacts can be tolerated by the designer/user, a straightforward
inexact voter is sufficient in implementing the median filter.
Figure 7.9. Blur metric varying with inexactness (M edian Filter - Hybrid) A sample blurred image is shown in Figure 7.10. The inexact circuit used
here is represented by the first data point in the above graphs. This shows the
original image, the image blurred with an exact voter, the image blurred with an
inexact voter, and the image blurred with a combination of the exact and inexact
68
voter. As mentioned earlier white (all 0's) artifacts can be seen in Figure 7.10 (c)
which are absent in Figure 7.10 (d).
(a) (b)
(c) (d)
Figure 7.10. Blurred Image Samples (a) Original image (b) Exact Median Rank (c) Inexact voter rank order filtered image (d) Hybrid voter rank order filtered image
7.3 Frequency based Blur Filter The experimental setup for a DCT based blur filter is the same as with the
median filters. The only difference is that the inexactness is introduced in the
comparator that is used to decide the DCT coefficients. A set of images were
69
processed with the exact comparator as well as its inexact variants. The extreme
ends of inexactness were taken to show the nature of the system in question.
The hardware analysis results are shown in Figures 7.11 and 7.12.
Figure 7.11. Circuit Area varying with Inexactness (Comparator)
As seen earlier, the inexactness has no bearing on the hardware
parameters. For example, the area of the inexact circuit represented by data
points 3 (inexactness of 8), and 8 (inexactness of 36), is similar in magnitude, but
vary a lot in the level of inexactness. The power dissipation follows a similar trend,
as explained in the previous sections.
Figure 7.12. Dynamic Power Dissipation varying with Inexactness (Comparator)
70
Figure 7.13. Blur metric varying with inexactness ( DCT based Filter)
The impact of the inexactness on the system accuracy can be seen in
Figure 7.13 in the form of a blur metric comparison. The blur metric was
explained in the earlier section. It can be seen that, except data points 1 and 5
(inexactness 7 and 17 respectively), the other inexact circuits give a better blur
than the exact variant. This necessarily does not mean that the inexact blurring
supersedes the exact one, but that perceptually, the inexact circuits give a blur
that is less annoying than the exact blur while taking a lesser area to operate on.
However, the standard deviation in the blur metric is less than 10%, which is
satisfactory.
7.4 Translation Lookaside Buffer (TLB)
A TLB simulator with the aging replacement policy was implemented for
experimentation in the C++ programming language [96]. This was analyzed with
both the exact and inexact versions of the comparator. As mentioned in Chapter
71
4, the bite length of the age parameter is 16 bits. This is built from the comparator
designed for the DCT based blur as given in section 7.3. SPEC program traces
were used in obtaining the page hit ratio for the exact and inexact comparators.
A uniform page size of 4KB is used throughout this study. The TLB has 64
entries. The analysis is done for 10000 memory requests. Figure 7.14 shows the
percentage of page faults varying with inexactness.
Figure 7.14. Page Fault varying with Inexactness
As seen in the figure, the increase in the number of page faults (max.
1.5%) is negligible compared to the gains provided in terms of the hardware
requirements. This is shown in Figure 7.15. Even if the overall power dissipated
in retrieving the extra pages is larger, the inexactness provides for a lesser area,
and a greater speed of operation. In this case, inexactness may not prove
advantageous when seen in the power domain alone. Even though Figures 7.15
and 7.16 seem to show a trend, this is not always guaranteed as seen from
earlier analysis.
72
Figure 7.15. Circuit Area varying with inexactness (Comparator )
Figure 7.16. Dynamic Power Dissipation varying with inexactness (Comparator)
As evidenced in this chapter, introducing inexactness in the above
application provides enormous gain in hardware, while having negligible or
tolerable effects on system performance.
73
Chapter 8
Conclusions This chapter presents a summary of the work done in building inexact
applications. Following this, a brief discussion on applications incurring
intolerable loss to system accuracy due to inexactness is presented. The chapter
is then concluded with inferences drawn from the results obtained.
8.1 Summary of Work
In this work, several applications were designed in an inexact fashion, with
varying levels of inexactness, and were compared and contrasted with the exact
version of the application. A novel, orthogonal level of circuit optimization in terms
of power, by using inexact logic has been presented. A set of general guidelines
are presented to design inexact circuits, on a per case basis. To simplify the
process, a heuristic framework to generate inexact circuits has been
implemented, with which multiple inexact versions of different decision circuits
were generated and tested in application scenarios having No impact, Tolerable
impact, Significant impact, on system accuracy respectively.
An inexact Hamming threshold voter, and a comparator, have been
designed and tested with the following set of applications. Bus coding, Median
Filter based Image blurring, and Non-Modular Redundancy (NMR), were chosen
74
as applications for the Hamming threshold voter. The inexact comparators were
tested with the LRU replacement algorithm for Translation Lookaside Buffers
(TLB), and a DCT threshold based image blurring process.
A comparison has been made between the various inexact circuits, and
the exact circuit, taking the levels of inexactness, the power dissipation of the
system, and the system performance/accuracy as parameters for comparison.
The results obtained promise a drastic reduction in power dissipation of the
circuit, up to 300%, on introducing inexactness. However, the reduction in power
and the level of inexactness are not correlated. Power reduction in fact occurs
only if the modification made to the logic function tends to reduce the number of
literals in the Boolean minimization. The levels of inexactness however relate
more closely with the system performance. Although critical path delay was not a
parameter for optimization, a gain of up to 30% was observed on this front. The
area, and static power dissipation also reduces significantly.
8.2 Applications affected by Inexactness
This category of applications is qualitatively discussed in this section.
Examples of N-Modular Redundancy (NMR) and Error Correcting Codes (ECC)
are considered in elucidating the impact of inexactness in such applications.
In NMR systems, the circuit which is to be fault tolerant is replicated N
times and the output of all these N circuits is fed to a voter circuit. The voters
75
used are majority voter, plurality voter, median voter etc. In this case, the
decision of the voter will decide whether the fault incurred in the system is
tolerable or not. A special case of this is the popular Tri-Modular-Redundancy
(TMR).
Using an inexact voter in this case may not be justified, as redundancy is
applied to get more reliability. This is especially true in critical applications, as
that of control circuits in space applications, and nuclear reactors. Here extra
effort is put in to build systems at increased cost so that system failure is minimal.
This is because of the cost involved in carrying out such missions. But in the
case of redundancy of sensors, the output of each sensor is subject to process
variation, and may not render the correct output as required by it. Each sensor
may give an output slightly deviated from the exact value. Thus, even usage of
an exact majority voter may not produce the expected output. So, inexact voting
solutions may not hinder correctness of the system output. This, however, is
speculative.
In the case of Error Correcting Codes, or Error detection, the use of
an inexact circuit will incorrectly detect errors, or miss errors that are otherwise
deciphered by the system. This may lead to inaccurate data with regard to error
correction, or unnecessary re-transmission of data. However, as mentioned
earlier, inexact circuits can be used in concurrent error detection [57], where the
inexact circuits can provide a reduced vector coverage for detecting errors.
76
8.3 Inference from Results
In the case of the majority voter for Bus Invert, the inexact versions of the
circuit are necessary to facilitate power reduction, more so in the case of on-chip
buses. The impact on system performance in terms of decreased power
reduction is negligible compared to the gains achieved in terms of system power.
An inexact voter based median rank filter performs credibly for certain levels of
inexactness, but results in a few image artefacts, for which a workaround has
been proposed in the form of a hybrid system. Such a system falls back on an
exact voter, if certain criterion is satisfied. An NMR system performs poorly on
introducing inexactness, as its primary function is to maintain system accuracy.
For the comparator, the Frequency based image blurring process proved
tolerable to the impact of inexactness. The colour distribution of the image was
retained. For the Page replacement algorithm, there was a negligent increase in
the miss rate, but the penalty of incurring the miss may not nullified by the power
reduction of the comparator.
An interesting study would be to analyze the impact of process variation on
such inexact circuits. Since the inexactness is at the functional level, such circuits
are impacted by process variation in the same manner as exact circuits. However,
the errors introduced can be constructive or destructive, in the sense that, the
error maybe in the part of the circuit already yielding a false result, or in the part
of the circuit that is retained from the exact version. This can be alleviated by
increasing the length of the part of the circuit (as in [97]) contributing to the
77
inexact output, so that it is not affected by delay errors caused due to process
variation. Errors caused in the “inexact part” can be neglected as this leads to a
double-negative which eventually results in the correct output, thereby reducing
the level of inexactness without affecting the reduction in power dissipation.
In conclusion, this thesis contends that the inception of inexactness as an
orthogonal layer of optimization can provide significant returns in terms of power
reduction and increase in speed of operation. The penalty incurred on introducing
inexactness is a design trade-off which has to be analyzed in a per case basis.
By incorporating this trade-off into the optimization parameters of the heuristic
framework, the design effort can be reduced. The final circuit generated depends
on the optimization parameters, and the acceptable error tolerance of the system.
Different inexact circuits can be used depending on the system requirements in
terms of accuracy and power.
78
Bibliography [1]. Microsoft Architecture Journal Vol. 18 Theme: Green Computing http://msdn.microsoft.com/en-us/architecture/bb410935.aspx [2]. Martin, T. L., Siewiorek, D. P., Smailagic, A., Bosworth, M., Ettus, M., and Warren, J. 2003. A case study of a system-level approach to power-aware computing. ACM Trans. Embed. Comput. Syst. 2, 3 (Aug. 2003), 255-276. DOI= http://doi.acm.org/10.1145/860176.860178 [3]. Mircea R. Stan, Kevin Skadron, "Guest Editors' Introduction: Power-Aware Computing," IEEE Computer, vol. 36, no. 12, pp. 35-38, Dec. 2003, doi:10.1109/MC.2003.1250876[ [4] Moore, Gordon E. (1965). "Cramming more components onto integrated circuits”/ Electronics Magazine. pp. 4. [5] Semiconductor Industry Association, International Technology Roadmap For Semiconductors. Website: http://www.itrs.net/. [6]The New York Times Technology section April 1, 2008. Light and Cheap, Netbooks Are Poised to Reshape PC Industry. [7] G. Gerosa, A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm High-К Metal-Gate CMOS, Proceedings of ISSCC 2008 [8] http://www.green500.org/lists.php Green 500 list [9] Sushant Sharma, Chung-Hsing Hsu, and Wu-chun Feng, “Making a case for a green 500 list”, 2nd IEEE IPDPS Workshop on High-Performance, Power-Aware Computing, April 2006 [10] Nano-cmos scaling problems and implications. Nano-CMOS Circuit and Physical Design, Ban P. Wong, Anurag Mittal, Yu Cao, and Greg Starr, John Wiley & Sons Inc. [11]. J. M. Rabaey. Digital Integrated Circuits. Prentice Hall, 1996. [12] Li, H., Bhunia, S., Chen, Y., Vijaykumar, T. N., and Roy, K. 2003. Deterministic Clock Gating for Microprocessor Power Reduction. In Proceedings of the 9th international Symposium on High-Performance Computer Architecture (February 08 - 12, 2003). HPCA. IEEE Computer Society, Washington, DC, 113. [13] Anita Lungu, Pradip Bose, Alper Buyuktosunoglu, Daniel J. Sorin: Dynamic power gating with quality guarantees. ISLPED 2009: 377-382 [14] De-Shiuan Chiou, Shih-Hsin Chen, Chingwei Yeh, "Timing driven power gating", Proceedings of the 43rd annual conference on Design automation,ACM Special Interest Group on Design Automation, pp.121 - 124, 2006
79
[15] Perloff, Microeconomics, Theory and Applications with Calculus page 178. Pearson 2008. [16] Kaushik Roy, Sharat Prasad, Low-Power CMOS VLSI Circuit Design. Wiley, Feb 2000 [17] Anantha Chandrakasan, Robert W. Brodersen, Low-Power CMOS Design, Wiley, Feb 1998 [18] Abinesh R., Bharghava R., M.B. Srinivas, "Transition Inversion Based Low Power Data Coding Scheme for Synchronous Serial Communication," isvlsi, pp.103-108, 2009 IEEE Computer Society Annual Symposium on VLSI, 2009 [19] Abinesh R., Bharghava R., Purini, S., and Regeti, G. Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting. In Proceedings of the 2010 23rd international Conference on VLSI Design (January 03 - 07, 2010). VLSID. IEEE Computer Society, Washington, DC, 158-163. [20] Abinesh R., Bharghava R, Suresh Purini, and Govindarajulu Regeti, “Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting”, J. Low Power Electronics (JOLPE) Vol. 6 No 3 , October 2010 [21] Hayes, J.P. (1993), Digital Logic Design, Addison Wesley [22]. W.V. Quine, “A Way to Simplify Truth Functions,”American Mathematical Monthly, 1955, Vol. 62, pp. 627–631. [23]. E.J. McCluskey, “Minimization of Boolean Functions,”Bell Systems Technical Journal, November 1956, Vol. 35,pp. 1417–1444. [24]. M. Karnaugh, "The map method for synthesis of combinational logic circuits," AIEE Trans., Vol. 72, no. 9, September 1953, pp. 593 – 599. [25] Brayton, R. K., Sangiovanni-Vincentelli, A. L., McMullen, C. T., and Hachtel, G. D. 1984 Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers. [26] Lewin, Douglas (1985), Design of Logic Systems, Van Nostrand (UK) [27] Lala, Parag K. (1996), Practical Digital Logic Design and Testing, Prentice Hall. [28]. C.A. Coello Coello, A.D. Christiansen, and A.A. Hernández, “Automated Design of Combinational Logic Circuits using Genetic Algorithms,” Proceedings of the International Conference on Artificial Neural Nets and Genetic Algorithms, ICANNGA'97, Springer Verlag, 1997, pp. 335 – 338. [29]. J.F. Miller, D. Job, and V. K. Vassiley, “Principles in the Evolutionary Design of Digital Circuits,” Genetic Programming and Evolvable Machines, 2000, Vol. 1, No. 3, pp. 259 – 288. [30]. C.A. Coello Coello, E.H. Luna, and A.H. Aguirre, “A Comparative Study of Encodings to Design Combinational Logic Circuits Using Particle Swarm Optimization,”
80
2004 NASA/DoD Conference on Evolvable Hardware, Seattle, Washington, USA, June 24 – 26, 2004, pp. 71 – 78. [31]. V.G. Gudise, and G.K. Venayagamoorthy, "Evolving Digital Circuits Using Particle Swarm," Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, Portland, OR, USA, July 20 – 24, 2003, pp. 468 – 472. [32]. P.W. Moore, and G. K. Venayagamoorthy, "Evolving Digital Circuits Using Hybrid Particle Swarm Optimization and Differential Evolution,” Conference on Neuro-Computing and Evolving Intelligence, Auckland, New Zealand, December 13 – 15, 2004, pp. 71 – 73. [33] Kabanets, Valentine; Cai, Jin-Yi (2000), "Circuit minimization problem", Proc. 32nd Symposium on Theory of Computing, Portland, Oregon, USA, pp. 73–79, doi:10.1145/335305.335314, ECCC TR99-045 . [34] M. Mano, C. Kime. "Logic and Computer Design Fundamentals" (Fourth Edition). Pg 54 [35] Enderton, H.. A Mathematical Introduction to Logic, second edition, 2001, Harcourt Academic Press. [36] Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Publishing Co. 1989 [37] King, D.B.S., Simpson, R.J., Moore, C., and MacDiarmid, I.P.: ‘Digital n-tuple Hamming comparator for weightless systems’, Electron. Lett., 1998, 34, (22), pp. 2103–2104 [38] Pedroni, V.A.: ‘Compact fixed-threshold and two-vector Hamming comparators’, Electron. Lett., 2003, 39, (24), pp. 1705–1706 [39] Piestrak, S.J.: ‘Design of self-testing checkers for unidirectional error detecting codes’, Scientific Papers of the Institute of Technology Cybernetics of Wroclaw University of Technology, No. 92, Series: Monographs No. 24=Oficyna Wydawnictwo Politechniki Wrocklawskiej, Wroclaw, 1995 [40] King, D.B.S., Simpson, R.J., Moore, C., and MacDiarmid, I.P.: ‘Hamming value comparator hierarchies’, Electron. Lett., 1999, 35, (11), pp. 910–911 [41] Asada, K., Komatsu, S., and Ikeda, M.: ‘Associative memory with minimum Hamming distance detector and its application to bus data encoding’. Proc. IEEE Asia-Pacific ASIC Conf. (AP-ASIC’99), 1999 [42] Barral, C., Coron, J.-S., and Naccache, D.: ‘Externalised fingerprint matching’, Lect. Notes Comput. Sci., 2004, 3072, pp. 309–315 [43] Chen, K.: ‘Bit-serial realisations of a class of nonlinear filters based on positive Boolean functions’, IEEE Trans. Circuits Syst., 1989, 36, (6), pp. 785–794
81
[44] Karaman., M., Onural, L., and Atalar, A.: ‘Design and implementation of a general purpose median filter unit in CMOS VLSI’, IEEE J. Solid-State Circuits, 1990, 25, (2), pp. 505–513 [45] Chua-Chin Wang, Ya-Hsin Hsueh, Hsin-Long Wu, and Chih-Feng Wu, “A Fast Dynamic 64-bit Comparator with Small Transistor Count,” VLSI Design, vol. 14, no. 4, pp. 389-395, 2002. [46] Minsu Kim, Joo-Young Kim, Hoi-Jun Yoo; , "A 1.55ns 0.015 mm2 64-bit quad number comparator," VLSI Design, Automation and Test, 2009. VLSI-DAT '09. International Symposium on , vol., no., pp.283–286. [47] Dorit H. Hochbaum, ed. Approximation Algorithms for NP-Hard problems, PWS Publishing Company, 1997. ISBN 0-534-94968-1. Chapter 9: Various Notions of Approximations: Good, Better, Best, and More [48] Vazirani, Vijay V. (2003). Approximation Algorithms. Berlin: Springer. ISBN 3540653678. [49] Krishna V. Palem, Lakshmi N. Chakrapani, Zvi M. Kedem, Lingamneni Avinash, Kirthi Krishna Muntimadugu: Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects. CASES 2009: 1-10 [50] Lakshmi N. Chakrapani, Pinar Korkmaz, Bilge E. S. Akgul, Krishna V. Palem: Probabilistic system-on-a-chip architectures. ACM Trans. Design Autom. Electr. Syst. 12(3): (2007) [51] Bilge E. S. Akgul, Lakshmi N. Chakrapani, Pinar Korkmaz, Krishna V. Palem: Probabilistic CMOS Technology: A Survey and Future Directions. VLSI-SoC 2006: 1-6 [52] D. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, 1992. [53] H. Fuks, “Non-deterministic density classifiation with diffusive probabilistic cellular automata,” Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, vol. 66, 2002. [54] E. Gelenbe, “Random neural networks with negative and positive signals and product form solution,” Neural Computation, vol. 1, no. 4, pp. 502– 511, 1989. [55] Y. Z. Ding and M. O. Rabin, “Hyper-Encryption and everlasting security,” in Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science; Lecture Notes In Computer Science, vol. 2285, 2002, pp. 1–26. [56] D. Marpe, T. Wiegand, and G. J. Sullivan, “The H.264/MPEG4-AVC standard and its fidelity range extensions,” IEEE Communications Magazine, Sept. 2005. [57] Choudhury, M. R. and Mohanram, K. 2008. Approximate logic circuits for low overhead, non-intrusive concurrent error detection. In Proceedings of the Conference on Design, Automation and Test in Europe (Munich, Germany, March 10 - 14, 2008). DATE '08. ACM, New York, NY, 903-908. DOI= http://doi.acm.org/10.1145/1403375.1403593
82
[58]. M. R. Stan, W. P. Burleson. Bus-Invert Coding for Low Power I/O, IEEE Transactions on Very Large Integration Systems, Vol. 3, No. 1, pp. 49-58, March 1995. [59]. L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano. Asymptotic Zero-Transition Activity Encoding for Address Buses in Low-Power Microprocessor-Based Systems, IEEE 7th Great Lakes Symposium on VLSI, Urbana, IL, pp. 77-82, Mar. 1997. [60]. E. Musoll, T. Lang, and J. Cortadella. Working-Zone Encoding f or reducing the energy in microprocessor address buses. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 4, Dec 1998 [61] El-Moursy, M. A. and Friedman, E. G. 2007. Wire shaping of RLC interconnects. Integr. VLSI J. 40, 4 (Jul. 2007), 461-472. [62] El-Moursy, M. A. and Friedman, E. G. 2003. Optimum wire sizing of RLC interconnect with repeaters. In Proceedings of the 13th ACM Great Lakes Symposium on VLSI (Washington, D. C., USA, April 28 - 29, 2003). GLSVLSI '03. ACM, New York, NY, 27-32. [63]. W. Fornaciari, M. Polentarutti, D.Sciuto, and C. Silvano, “Power Optimization of System-Level Address Buses Based on Software Profiling,” CODES, pp. 29-33, 2000. [64] E. Musoll, T. Lang, and J. Cortadella, “ Exploiting the locality of memory references to reduce the address bus energy”, Proceedings of International Symposium on Low Power Electronics and Design, pp. 202-207, Monterey CA, August 1997. [65] Jun Yang, Rajiv Gupta, Chuanjun Zhang. Frequent value encoding for low power data buses. ACM Trans. Design Autom. Electr. Syst. 9(3): 354-384 (2004) [66] C. Su, C. Tsui, and A. Despain. Saving power in the control path of embedded processors, IEEE Design and Test of computers, 11(4):24–30, 1994 [67] Wei-Chung Cheng, Massoud Pedram. Memory Bus Encoding for Low Power: A Tutorial. ISQED 2001: 199-204 [68] Tanenbaum, Andrew S. Modern Operating Systems (Second Edition). New Jersey: Prentice-Hall 2001. [69] Aho, Denning and Ullman, Principles of Optimal Page Replacement, Journal of the ACM, Vol. 18, Issue 1,January 1971, pp 80-93 [70] Elizabeth J. O'Neil and others, The LRU-K page replacement algorithm for database disk buffering, ACM SIGMOD Conference, pp. 297–306, 1993. [71] Song Jiang and Xiaodong Zhang, LIRS: a Low Inter Reference recency Set replacement, SIGMETRICS 2002 [72] Richard W. Carr and John L. Hennessy, "WSCLOCK—a simple and effective algorithm for virtual memory management", 1981
83
[73] Ballapuram, C., Puttaswamy, K., Loh, G. H., and Lee, H. S. 2006. Entropy-based low power data TLB design. In Proceedings of the International Conference on Compilers, Architecture and Synthesis For Embedded Systems (2006), pp. 304-311. [74] Rhodehamel, Michael W. "The Bus Interface and Paging Units of the i860(tm) Microprocessor". In Proc. IEEE International Conference on Computer Design, p. 380-384, 1989. [75] B. K. Kar, D. K. Pradhan, “A new algorithm for order statistic and sorting,” IEEE Transactions on Signal Processing, Vol. 41, no. 8, pp. 2688-2699, Aug. 1993. [76] K. Oflazer, “Design and implementation of a single-chip 1-D median filter,” IEEE Transactions on Acoustics,Speech, Signal Processing, vol. 31, no. 5, pp. 1164-1168, Oct. 1983. [77] J. P. Fitch, E. J. Coyle, N. C. Gallagher, “Median filtering by threshold decomposition,” IEEE Transactions on Acoustics, Speech, Signal Processing, vol. 32, no. 6, pp.1183-1188, Dec. 1984. [78] L. W. Chang, S. S. Yu, “A new implementation of generalized order statistic filter by threshold decomposition,” IEEE Transactions on Signal Processing, Vol. 40, no. 12, pp. 3062-3066, Dec. 1992. [79] Volnei A. Pedroni: Compact Hamming-Comparator-based rank order filter for digital VLSI and FPGA implementations. ISCAS (2) 2004: 585-588 [80] IEEE Trans. Circuits and Systems Special Issue on Digital Filtering and Image Processing, Vol. CAS-2, 1975. [81] R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 4. [82] R. Hamming Digital Filters, Prentice-Hall, 1983. [83] Miller, J. F., Job, D., and Vassilev, V. K. 2000. Principles in the Evolutionary Design of Digital Circuits—Part I. Genetic Programming and Evolvable Machines 1, 1-2 (Apr. 2000), 7-35. [84] Miller, J. F., Job, D., and Vassilev, V. K. 2000. Principles in the Evolutionary Design of Digital Circuits—Part II. Genetic Programming and Evolvable Machines 1, 3 (Jul. 2000), 259-288. [85] Miller, J. F. and Harding, S. L. 2008. Cartesian genetic programming. In Proceedings of the 2008 GECCO Conference Companion on Genetic and Evolutionary Computation (Atlanta, GA, USA, July 12 - 16, 2008). M. Keijzer, Ed. GECCO '08. ACM, New York, NY, 2701-2726. [86] Xu, H., Ding, Y., and Hu, Z. 2009. Adaptive immune genetic algorithm for logic circuit design. In Proceedings of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation (Shanghai, China, June 12 - 14, 2009), 639-644.
84
[87] T. Back, F. Hoffmeister, and H.-P. Schwefel, ‘‘A survey of evolutionary strategies,’’ in Proceedings of the 4th International Conference on Genetic Algorithms, 1991, pp. 2-9. [88] Miller J. F. An empirical study of the efficiency of learning Boolean functions using a Cartesian Genetic Programming Approach. Proceedings of the 1st Genetic and Evolutionary Computation Conference (GECCO'99). pp. 1135--1142. [89] Krohling R, Zhou Y, Tyrrell A, “Evolving FPGA-based robot controllers using an evolutionary algorithm,” In Proc. Intl. conf. on Artificial Immune Systems, 2002, pp 41–46 [90] Bhagat, Phiroz Pattern Recognition in Industry, Elsevier, ISBN 0-08-044538-1. [91] Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining (2005), ISBN 0-321-32136-7 [92] Thomas, Donald, Moorby, Phillip "The Verilog Hardware Description Language" Kluwer Academic Publishers, Norwell, MA. [93] http://www.spec2000.com/ [94] Samuelson & Nordhaus, Microeconomics, 17th ed. page 110. McGraw Hill 2001. [95] Frederique Crete, Thierry Dolmiere, Patricia Ladret, and Marina Nicolas, “The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Proceedings of SPIE 6492, 64920I (2007) [96] Stroustrup, Bjarne (1997). The C++ Programming Language (Third ed.), Addison-Wesley
[97] Banerjee, N., Karakonstantis, G., and Roy, K., “Process variation tolerant low power DCT architecture,” Proceedings of the Conference on Design, Automation and Test in Europe (2007), pp. 630-635. [98] Latif-Shabgahi, G.; Bass, J.M.; Bennett, S.; , "Efficient implementation of inexact majority and median voters," Electronics Letters , vol.36, no.15, pp.1326-1328, 20 Jul 2000 [99] Sahni, S. and Gonzales, T., “P-complete problems and approximate solutions,”. In Proceedings of the 15th Annual Symposium on Switching and Automata theory (1974) [100] Hromkovič, J. 2001 Algorithmics for Hard Problems: Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics. Springer-Verlag New York, Inc. [101] I. B. Gurevich, Yu. I. Zhuravlev , "Minimization of boolean functions and effective recognition algorithms," Cybernetics and Systems Analysis (1974), vol. 10, no. 3, pp. 393-397 [102] Oppenheim, A. V., Schafer, R. W., and Buck, J. R. 1999 Discrete-Time Signal Processing (2nd Ed.). Prentice-Hall, Inc.