5 Times Table: Project Perakul, Multiplication for the Hearing Impaired
Project Report Multiplication
-
Upload
sweta-chandan -
Category
Documents
-
view
220 -
download
0
Transcript of Project Report Multiplication
-
7/28/2019 Project Report Multiplication
1/32
CONTENTS
ABSTRACT LIST OF SYMBOLS AND ACRONYMS VHDL INTRODUCTION TO FLOATING POINT NUMBER INTRODUCTION HISTORY RANGE OF FPN FPN PRECISION IEEE 754 FPN STANDARD FPN REPRESENTATION COMPUTER REPRESENTATION IEEE FPN REPRESENTATION ATTRIBUTES & ROUNDING FPN ARITHMETIC FPN REPRESENTATION FORMAT PARAMETERS FOR THE IEEE 754 FLOATING-POINT STANDARD FPN MULTIPLICATION DENORMALS FPN MULTIPLICATION ALGORITHM HARDWARE OF FLOATING POINT MULTIPLIER UNSIGNED MULTIPLIER ADDITION PROCESS NORMALIZER UNDERFLOW/OVERFLOW DETECTION MULTIPLICATION FLOWCHART
-
7/28/2019 Project Report Multiplication
2/32
STRUCTURE OF MULTIPLICATION FLOATING POINT MULTIPLIER ARCHITECTURE PROPOSED CIFM ARCHITECTURE Real-life application Optimization criteria Application APPENDIX CONCLUSION REFERENCES
-
7/28/2019 Project Report Multiplication
3/32
FLOATING POINT MULTIPLICATION USING VHDL
A report submitted in partial fu lf il lment of the requirements for the Degree
of
Bachelor of Technology
in
Electronincs and Communication Engineering
Under the Guidance of
Manas Ranjan Tripathy
Department of Electronics and Communication Engineering
INSTITUTE OF TECHNICAL EDUCATION & RESEARCH, BHUBANESWAR
(SIKSHA O ANUSANDHAN UNIVERSITY, ODISHA)
2012
Submitted by:Bibhu bhusHan panda (0911016214)
Sadbhab patra (0911016231)Chandrakanta parida (1021016041)
Sweta chandan(0911016244)
-
7/28/2019 Project Report Multiplication
4/32
INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH
CERTIFICATE
This is to certif y that the project titled, FLOATING POINT MULTIPLICATION USINGVHDL is the a bonafide work of group C4, in partial ful fi llment for the award of
Degree Bachelor of Technology in E lectronics and communication Engineer ing,
conducted under my supervision.
Project guide:
Mr. Manas Ranjan Tripathy
(Lecturer)
Department of Electronics and communication Engineering
ITER, BHUBANESWAR
-
7/28/2019 Project Report Multiplication
5/32
DECLARATION
We certify that
a. The work contained in this report is original and has been done by us under the guidance of
our supervisor.
b. The work has not been submitted to any other Institute for any degree or diploma.
c. We have followed the guidelines provided by the Institute in preparing the report.
d. We have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
e. Whenever we have used materials (data, theoretical analysis, figures, and text) from other
sources, we have given due credit to them by citing them in the text of the report and giving their
details in the reference. Further, we have taken permission from the copyright owners of the
sources, whenever necessary.
BIBHU BHUSHAN PANDA (0911016214)
SADBHAB PATRA (0911016231)
CHANDRAKANTA PARIDA (1021016041)
SWETA CHANDAN(0911016244)
-
7/28/2019 Project Report Multiplication
6/32
ACKNOWLEDGMENT
We would like to thank Mr. Manas Ranjan Tripathy for providing us this
opportunity to present the project on FLOATING POINT
MULTIPLICATION USING VHDL
We would like to thank Prof. Mr.Bibhu Prasad Mohanty(HOD), Prof Dr
Niva Das (Associate Dean) & Mr.Manas Ranjan Tripathyfor his constant supportand guidance. We would also extend my gratitude to the faculty and staff of
Department Electronics and communication engineering, for their valuable
insights which made this project a success.
Lastly, we would thank one and all, who helped building this project and
guided us in all aspects to its success.
BIBHU BHUSHAN PANDA (0911016214)
SADBHAB PATRA (0911016231)
CHANDRAKANTA PARIDA (1021016041)
SWETA CHANDAN(0911016244)
-
7/28/2019 Project Report Multiplication
7/32
ABSTRACT
Shrinking feature sizes gives more headroom for designers to extend the functionality of
microprocessor. As processor support for decimal floating-point arithmetic emerges, it isimportant to investigate efficient algorithms and hardware designs for common decimal floating-
point arithmetic algorithms.
This paper presents designs for a decimal floating-point adder . floating-point arithmetic is
usually sufficient for scientific and statistics applications. However, it is not sufficient for many
commercial applications and database systems, in which operations often need to mirror manual
calculations. Therefore, these applications often use software to perform decimal floating-point
arithmetic operations .
This standard provides a method for computation with floating-point numbers that will yield the
same result whether the processing is done in hardware, software, or a combination of the two.
The results of the computation will be identical, independent of implementation, given the same
input data. Errors, and error conditions, in the mathematical processing will be reported in a
consistent manner regardless of implementation.
Keywords:- exponent,normalized value,subnormal,subnormal numbers.
-
7/28/2019 Project Report Multiplication
8/32
LIST OF SYMBOLS
LIST OF ACRONYMS
Serial No. Symbols Meaning1 X Real Number
2 M Significant
3 E Exponent
Serial No. Acronyms Meaning1 OFL Overflow level
2 UFL Underflow Level
3 NAN Not a Number
-
7/28/2019 Project Report Multiplication
9/32
Chapter 1
1. VHDLThe VHSIC (very high speed integrated circuits) Hardware Description Language (VHDL)
was first proposed in 1981. The development of VHDL was originated by IBM, Texas
Instruments, and Inter-metrics in 1983. The result, contributed by many participating EDA
(Electronics Design Automation) groups, was adopted as the IEEE 1076 standard in
December 1987. VHDL is intended to provide a tool that can be used by the digital systems
community to distribute their designs in a standard format. Using VHDL, they are able to
talk to each other about their complex digital circuits in a common language withoutdifficulties of revealing technical details.
As a standard description of digital systems, VHDL is used as input and output to various
simulation, synthesis, and layout tools. The language provides the ability to describe
systems, networks, and components at a very high behavioral level as well as very low gate
level. It also represents a top-down methodology and environment.
Simulations can be carried out at any level from a generally functional analysis to a very
detailed gate-level wave form analysis.
1.1 INTRODUCTION TO FLOATING POINT NUMBERS
1. INTRODUCTION
In computing, floating point describes a method of representing real numbers in a way that can
support a wide range of values. Numbers are, in general, represented approximately to a fixed
number ofsignificant digits and scaled using an exponent. The base for the scaling is normally 2,
10 or 16. The typical number that can be represented exactly is of the form:
Significant digits baseexponent
http://en.wikipedia.org/wiki/Computinghttp://en.wikipedia.org/wiki/Real_numberhttp://en.wikipedia.org/wiki/Significant_figureshttp://en.wikipedia.org/wiki/Exponentiationhttp://en.wikipedia.org/wiki/Exponentiationhttp://en.wikipedia.org/wiki/Significant_figureshttp://en.wikipedia.org/wiki/Real_numberhttp://en.wikipedia.org/wiki/Computing -
7/28/2019 Project Report Multiplication
10/32
The term floating point refers to the fact that the radix point (decimal point, or, more commonly
in computers, binary point) can "float"; that is, it can be placed anywhere relative to the
significant digits of the number. This position is indicated separately in the internal
representation, and floating-point representation can thus be thought of as a computer realization
ofscientific notation. Over the years, a variety of floating-point representations have been used
in computers. However, since the 1990s, the most commonly encountered representation is that
defined by the IEEE 754 Standard.
The advantage of floating-point representation overfixed-point and integerrepresentation is that
it can support a much wider range of values. For example, a fixed-point representation that has
seven decimal digits with two decimal places can represent the numbers 12345.67, 123.45, 1.23
and so on, whereas a floating-point representation (such as the IEEE 754 decimal32 format) with
seven decimal digits could in addition represent 1.234567, 123456.7, 0.00001234567,
1234567000000000, and so on. The floating-point format needs slightly more storage (to encode
the position of the radix point), so when stored in the same space, floating-point numbers achieve
their greater range at the expense ofprecision.
i.History
Leonardo Torres y Quevedo in 1914 designed an electro-mechanical version of the Analytical
Engine ofCharles Babbage which included floating-point arithmetic. In 1938, Konrad Zuse of
Berlin completed the Z1, the first mechanical binary programmable computer, this was however
unreliable in operation. It worked with 22-bit binary floating-point numbers having a 7-bit signed
exponent, a 15-bit significand (including one implicit bit), and a sign bit. The memory used
sliding metal parts to store 64 words of such numbers. The relay-based Z3, completed in 1941
had representations for plus and minus infinity. It implemented defined operations with infinity
such as 1/ = 0 and stopped on undefined operations like 0. It also implemented the square
root operation in hardware. Konrad Zuse, architect of the first programmable computer, which
used 22-bit binary floating point.
Zuse also proposed, but did not complete, carefully rounded floatingpoint arithmetic that would
have included and NaNs, anticipating features of IEEE Standard floatingpoint by four
http://en.wikipedia.org/wiki/Radix_pointhttp://en.wikipedia.org/wiki/Scientific_notationhttp://en.wikipedia.org/wiki/IEEE_754http://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Integer_(computer_science)http://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Accuracy_and_precisionhttp://en.wikipedia.org/wiki/Leonardo_Torres_y_Quevedohttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Charles_Babbagehttp://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Z1_%28computer%29http://en.wikipedia.org/wiki/Relayhttp://en.wikipedia.org/wiki/Z3_%28computer%29http://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Z3_%28computer%29http://en.wikipedia.org/wiki/Relayhttp://en.wikipedia.org/wiki/Z1_%28computer%29http://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Charles_Babbagehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Leonardo_Torres_y_Quevedohttp://en.wikipedia.org/wiki/Accuracy_and_precisionhttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Integer_(computer_science)http://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/IEEE_754http://en.wikipedia.org/wiki/Scientific_notationhttp://en.wikipedia.org/wiki/Radix_point -
7/28/2019 Project Report Multiplication
11/32
decades. By contrast, von Neumann recommended against floating point for the 1951 IAS
machine, arguing that fixed point arithmetic was preferable.
The first commercial computer with floating point hardware was Zuse's Z4 computer designed in
19421945. The Bell Laboratories Mark V computer implemented decimal floating point in
1946.
Prior to the IEEE-754 standard, computers used many different forms of floating-point. These
differed in the word sizes, the format of the representations, and the rounding behavior of
operations. These differing systems implemented different parts of the arithmetic in hardware
and software, with varying accuracy.
The IEEE-754 standard was created in the early 1980s after word sizes of 32 bits (or 16 or 64)
had been generally settled upon. This was based on a proposal from Intel who were designing the
i8087 numerical coprocessor. Prof. W. Kahan was the primary architect behind this proposal,
along with his student Jerome Coonen at U.C. Berkeley and visiting Prof. Harold Stone, for
which he was awarding the 1989 Turing award. Among the innovations are these:
A precisely specified encoding of the bits, so that all compliant computers wouldinterpret bit patterns the same way. This made it possible to transfer floating-point
numbers from one computer to another.
A precisely specified behavior of the arithmetic operations: arithmetic operations wererequired to be correctly rounded, i.e. to give the same result as if infinitely precise
arithmetic was used and then rounded. This meant that a given program, with given data,
would always produce the same result on any compliant computer. This helped reduce
the almost mystical reputation that floating-point computation had for seemingly
nondeterministic behavior.
The ability of exceptional conditions (overflow, divide by zero, etc.) to propagate througha computation in a benign manner and be handled by the software in a controlled way.
http://en.wikipedia.org/wiki/John_von_Neumannhttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/Z4_%28computer%29http://en.wikipedia.org/wiki/IEEE-754http://en.wikipedia.org/wiki/Intel_8087http://en.wikipedia.org/wiki/William_Kahanhttp://en.wikipedia.org/wiki/William_Kahanhttp://en.wikipedia.org/wiki/Intel_8087http://en.wikipedia.org/wiki/IEEE-754http://en.wikipedia.org/wiki/Z4_%28computer%29http://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/John_von_Neumann -
7/28/2019 Project Report Multiplication
12/32
ii.Range of floating-point numbers
By allowing the radix point to be adjustable, floating-point notation allows calculations over a
wide range of magnitudes, using a fixed number of digits, while maintaining good precision. For
example, in a decimal floating-point system with three digits, the multiplication that humans
would write as
0.12 0.12 = 0.0144
would be expressed as
(1.20101
) (1.20101
) = (1.44102
).
In a fixed-point system with the decimal point at the left, it would be
0.120 0.120 = 0.014.
A digit of the result was lost because of the inability of the digits and decimal point to 'float'
relative to each other within the digit string.
The range of floating-point numbers depends on the number of bits or digits used for
representation of the significand (the significant digits of the number) and for the exponent. On a
typical computer system, a 'double precision' (64-bit) binary floating-point number has acoefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit. Positive
floating-point numbers in this format have an approximate range of 10308
to 10308
, because the
range of the exponent is [1022,1023] and 308 is approximately log10(21023
). The complete range
of the format is from about 10308
through +10308
.
The number of normalized floating point numbers in a system F (B, P, L, U) (where B is the base
of the system, P is the precision of the system to P numbers, L is the smallest exponent
representable in the system, and U is the largest exponent used in the system) is: .
There is a smallest positive normalized floating-point number, Underflow level = UFL = which
has a 1 as the leading digit and 0 for the remaining digits of the significand, and the smallest
possible value for the exponent.
http://en.wikipedia.org/wiki/Radix_pointhttp://en.wikipedia.org/wiki/Radix_point -
7/28/2019 Project Report Multiplication
13/32
There is a largest floating point number, Overflow level = OFL = which has B 1 as the
value for each digit of the significand and the largest possible value for the exponent.
In addition there are representable values strictly between UFL and UFL. Namely, zero and
negative zero, as well as subnormal numbers.
iii.Floating-point precisions
IEEE 754:
16-bit: Half (binary16)
32-bit: Single (binary32), decimal32
64-bit: Double (binary64), decimal64
128-bit: Quadruple (binary128), decimal128
Extended precision formats
Other: Minifloat Arbitrary precision
The IEEE has standardized the computer representation for binary floating-point numbers in
IEEE 754 (aka. IEC 60559). This standard is followed by almost all modern machines. Notable
exceptions include IBM mainframes, which support IBM's own format (in addition to the IEEE754 binary and decimal formats), and Cray vector machines, where the T90 series had an IEEE
version, but the SV1 still uses Cray floating-point format.
The standard provides for many closely related formats, differing in only a few details. Five of
these formats are called basic formats and others are termed extended formats, and three of these
are especially widely used in computer hardware and languages:
Single precision, called "float" in the C language family, and "real" or "real*4" inFortran. This is a binary format that occupies 32 bits (4 bytes) and its significand has a
precision of 24 bits (about 7 decimal digits).
Double precision, called "double" in the C language family, and "double precision" or"real*8" in Fortran. This is a binary format that occupies 64 bits (8 bytes) and its
significand has a precision of 53 bits (about 16 decimal digits).
http://en.wikipedia.org/wiki/Subnormal_numbershttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Precision_%28computer_science%29http://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/Half-precision_floating-point_formathttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/Single-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Double-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal64_floating-point_formathttp://en.wikipedia.org/wiki/128-bithttp://en.wikipedia.org/wiki/Quadruple-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal128_floating-point_formathttp://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Minifloathttp://en.wikipedia.org/wiki/Arbitrary-precision_arithmetichttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecturehttp://en.wikipedia.org/wiki/Crayhttp://en.wikipedia.org/wiki/Cray_T90http://en.wikipedia.org/wiki/Cray_SV1http://en.wikipedia.org/wiki/Single_precisionhttp://en.wikipedia.org/wiki/C_%28programming_language%29http://en.wikipedia.org/wiki/Fortranhttp://en.wikipedia.org/wiki/Double_precisionhttp://en.wikipedia.org/wiki/Double_precisionhttp://en.wikipedia.org/wiki/Fortranhttp://en.wikipedia.org/wiki/C_%28programming_language%29http://en.wikipedia.org/wiki/Single_precisionhttp://en.wikipedia.org/wiki/Cray_SV1http://en.wikipedia.org/wiki/Cray_T90http://en.wikipedia.org/wiki/Crayhttp://en.wikipedia.org/wiki/IBM_Floating_Point_Architecturehttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Arbitrary-precision_arithmetichttp://en.wikipedia.org/wiki/Minifloathttp://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Decimal128_floating-point_formathttp://en.wikipedia.org/wiki/Quadruple-precision_floating-point_formathttp://en.wikipedia.org/wiki/128-bithttp://en.wikipedia.org/wiki/Decimal64_floating-point_formathttp://en.wikipedia.org/wiki/Double-precision_floating-point_formathttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Single-precision_floating-point_formathttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/Half-precision_floating-point_formathttp://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/Precision_%28computer_science%29http://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Subnormal_numbers -
7/28/2019 Project Report Multiplication
14/32
Double extended format, 80-bit floating point value. This is implemented on mostpersonal computers but not on other devices. Sometimes "long double" is used for this in
the C language family (the C99 and C11 standards "IEC 60559 floating-point arithmetic
extension- Annex F" recommend the 80-bit extended format to be provided as "long
double" when available), though "long double" may be a synonym for "double" or may
stand for quadruple precision. Extended precision can help minimise accumulation of
round-off errorin intermediate calculations.
Any integer with absolute value less than or equal to 224
can be exactly represented in the single
precision format, and any integer with absolute value less than or equal to 253
can be exactly
represented in the double precision format. Furthermore, a wide range of powers of 2 times such
a number can be represented. These properties are sometimes used for purely integer data, to get
53-bit integers on platforms that have double precision floats but only 32-bit integers. To a rough
approximation, the bit representation of an IEEE binary floating-point number is proportional to
its base 2 logarithm.
Chapter 2
http://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Long_doublehttp://en.wikipedia.org/wiki/C99http://en.wikipedia.org/wiki/C11_%28C_standard_revision%29http://en.wikipedia.org/wiki/Round-off_errorhttp://en.wikipedia.org/wiki/Round-off_errorhttp://en.wikipedia.org/wiki/C11_%28C_standard_revision%29http://en.wikipedia.org/wiki/C99http://en.wikipedia.org/wiki/Long_doublehttp://en.wikipedia.org/wiki/Extended_precision -
7/28/2019 Project Report Multiplication
15/32
2. IEEE-754 FLOATING-POINT STANDARDIn the early days of digital computers, it was quite common that machines from different
vendors have different word lengths and unique floating-point formats. This caused many
problems, especially in the porting of programs between different machines (designs). A main
objective in developing such a standard, floating-point representation standard is to make
numerical programs predictable and completely portable, in the sense of producing identical
results when run on different machines.The IEEE-754 floating-point standard, formally named
ANSI/IEEE Std 754-1985, introduced in 1985 tried to solve these problems. Our main
objective for this standard is that an implementation of a floating-point system confirming to
this standard can be realized in software, entirely in hardware, or in any combination of
software and hardware. The standard specifies two formats for floating-point numbers, basic
(single precision) and extended(double precision), it also specifies the basic operations for both
formats which are addition and subtraction of operations. Finally, it describes the different
floating-point exceptions and their handling, including non-numbers (NaNs).
Table 1: Features of the ANSI/IEEE Standard Floating-Point Representation
Feature Single Double
Word length, bits 32 64
Significant bits 23+1(hidden) 52+1(hidden)
Significant Range [1,2-2-23] [1,2-2-52]
Exponent Bits 8 11
Exponent Bias 127 1023
Zero ( 0) E + bias = 0, f = 0 e + bias = 0, f = 0
Denormal E + bias = 0, f 0 e + bias = 0, f 0
Infinity () E + bias = 255, f = 0 e + bias = 2047, f = 0
Not-a-Number (NAN) E + bias = 255, f 0 e + bias = 2047, f 0
Minimum 2-126 1.2 10 -38 2-1023 1.2 10 -308
Maximum 2128 3.4 10 38 21024 1.8 10 308
PROBLEMS ASSOCIATED WITH FLOATING POINT ADDITION
-
7/28/2019 Project Report Multiplication
16/32
For the input the exponent of the number may be dissimilar. And dissimilar exponent cant be
added directly. So the first problem is equalizing the exponent. To equalize the exponent the
smaller number must be increased until it equals to that of the larger number. Then significant
are added. Because of fixed size of mantissa and exponent of the floating-point number cause
many problems to arise during addition and subtraction. The second problem associated with
overflow of mantissa. It can be solved by using the rounding of the result. The third problem is
associated with overflow and underflow of the exponent. The former occurs when mantissa
overflow and an adjustment in the exponent is attempted the underflow can occur while
normalizing a small result. Unlike the case in the fixed-point addition, an overflow in the
mantissa is not disabling; simply shifting the mantissa and increasing the exponent can
compensate for such an overflow. Another problem is associated with normalization of addition
and subtraction. The sum or difference of two significant may be a number, which is not in
normalized form. So it should be normalized before returning results.
2.1 Floating Point Representation
i. Computer Representation of Numbers
Computers which work with real arithmetic use a system called floating point.
Suppose a real number x has the binary expansion
X = , 1 and
m = (
To store a number in floating point representation, a computer word is divided into 3 fields,
representing the sign, the exponent E, and the significand m respectively. A 32-bit word could
be divided into fields as follows: 1 bit for the sign, 8 bits for the exponent and 23 bits for the
significand. Since the exponent field is 8 bits, it can be used to represent exponents between -128
and 127. The significand field can store the first 23 bits of the binary representation of m, namely
-
7/28/2019 Project Report Multiplication
17/32
FORMATS:- This defines floating-point formats, which are used to represent a finite subset of
real numbers Formats are characterized by their radix, precision, and exponent range, and each
format can represent
a unique set of floating-point data . All formats can be supported as arithmetic formats; that is,
they may be used to represent floating-point operands.Specific fixed-width encodings for binary
and decimal formats are defined in this clause for a subset of the formats . These interchange
formats are identified by their size and can be used for the exchange of floating-point data
between implementations.
Five basic formats are defined : Three binary formats, with encodings in lengths of 32, 64, and
128 bits . Two decimal formats, with encodings in lengths of 64 and 128 bits.
Additional arithmetic formats are recommended for extending these basic formats . The choice
of which of this standards formats to support is language-defined or, if the relevant language
standard is silent or defers to the implementation, implementation-defined. The names used for
formats in this standard are not necessarily those used in programming environments.
ii. IEEE Floating Point Representation
In the 1960's and 1970's, each computer manufacturer developed its own floating point system,
leading to a lot of inconsistency as to how the same program behaved on different machines. Forexample, although most machines used binary floating point systems, the IBM 360/370 series,
which dominated computing during this period, used a hexadecimal base, i.e. numbers were
represented as m . Other machines, such as HP calculators,used a decimal floating point
system. Through the efforts of several computer scientists, particularly W.Kahan, a binary
floating point standard was developed in the early 1980's and, most importantly, followed very
carefully by the principal manufacturers of floating point chips for personal computers,namely
Intel and Motorola. This standard has become known as the IEEE floating point standard since it
was developed and endorsed by a working committee of the Institute for Electrical and
Electronics Engineers.
The IEEE standard has three very important requirements:
-- _ consistent representation of floating point numbers across all machines adopting the standard
-
7/28/2019 Project Report Multiplication
18/32
-- correctly rounded arithmetic
--consistent and sensible treatment of exceptional situations such as division by zero
We start with the following observation. In the last section, we chose to normalize a nonzero
number x so that x = m , where 1 m < 2, i.e.
m = ( ..)2
with b0 = 1. In the simple floating point model , we stored the leading nonzero bit b0 in the first
position of the field provided for m. Note, however, that since we know this bit has the value
one, it is not necessary to store it. Consequently, we can use the 23 bits of the significand field
to store b1.b2 b23 instead of b0.b1.. b22, changing the machine precision from = to
= . Since the bitstring stored in the significand field is now actually the fractional part of
the significand, we shall
refer henceforth to the field as the fraction field. Given a string of bits in the fraction field, it is
necessary to imagine that the symbols 1. appear in front of the string, even though these
symbols are not stored. This technique is called hidden bit normalization and was used by Digital
for the Vax machine in the late 1970s.
iii.Attributes and rounding
Attribute specification :- An attribute is logically associated with a program block
to modify its numerical and exception semantics. A user can specify a constant value for an
attribute parameter. Some attributes have the effect of an implicit parameter to most individual
operations of this standard;language standards shall specify
rounding-direction attributes and should specify
alternate exception handling attributes .
Other attributes change the mapping of language expressions into operations of this standard;
language standards that permit more than one such mapping should provide support for:
preferredWidth attributes
-
7/28/2019 Project Report Multiplication
19/32
value-changing optimization attributes
reproducibility attributes
For attribute specification, the implementation shall provide language-defined means, such as
compiler directives, to specify a constant value for the attribute parameter for all standard
operations in a block; the scope of the attribute value is the block with which it is associated.
Language standards shall provide for constant specification of the default and each specific value
of the attribute.
Rounding and Correctly Rounded Arithmetic:-
We use the terminology floating point numbers" to mean all acceptable numbers in a given IEEE
floating point arithmetic format. This set consists of0, subnormal and normalized numbers,
and , but not NaN values, and is a finite subset of the reals. We have seen that most real
numbers, such as 1/10 and pi, cannot be represented exactly as floating point numbers. For
ease of expression we will say a general real number is normalized if its modulus lies between
the smallest and largest positive normalized floating point numbers, with a corresponding use of
the word subnormal. In both cases the representations we give for these numbers will parallel
the floating point number representations in that b0 = 1 for normalized numbers, and b0 = 0 with
E = -126 for subnormal numbers.
For any number x which is not a floating point number, there are two obvious choices for the
floating point approximation to x: the closest floating point number less than x, and the closest
floating point number greater than x. The IEEE standard defines the correctly rounded value of
x, which we shall denote round(x), as follows. If x happens to be a floating point number, then
round(x) = x. Otherwise, the correctly rounded value depends on which of the following four
rounding modes is in effect:
Round downround(x) = x_:
Round up
round(x) = x+:
Round towards zero
round(x) is either x_ or x+, whichever is between zero and x.
-
7/28/2019 Project Report Multiplication
20/32
Round to nearest
round(x) is either x_ or x+, whichever is nearer to x. In the case of a tie, the one with its least
significant bit equal to zero is chosen.
If x is positive, then x_ is between zero and x, so round down and round towards zero have the
same effect. If x is negative, then x+ is between zero and x, so it is round up and round towards
zero which have the same effect. In either case, round towards zero simply requires truncating
the binary expansion, i.e. discarding bits.
The most useful rounding mode, and the one which is almost always used, is round to nearest,
since this produces the floating point number which is closest to x . In the case of toy precision,
with x = 1=7, it is clear that
round to nearest gives a rounded value of x equal to 1.75. When the word round is used
without any qualification, it almost always means round to
nearest. In the more familiar decimal context, if we round the number pi= 3.14159 to four
decimal digits, we obtain the result 3.142, which is closer to pi than the truncated result 3.141.
iv.Floating Point Arithmetic
Although integers provide an exact representation for numeric values, they suffer from two
major drawbacks:
--the inability to represent fractional values
-- a limited dynamic range.
Floating point arithmetic solves these two problems at the expense of accuracy and, on some
processors, speed. Most programmers are aware of the speed loss
associated with floating point arithmetic; however, they are blithely unware of the problems with
accuracy.
For many applications, the benefits of floating point outweigh the disadvantages.
A big problem with floating point arithmetic is that it does not follow the standard rules of
algebra. Nevertheless, many programmers apply normal algebraic rules when using floating
point arithmetic. This is a source of bugs in many programs. One of the primary
goals of this section is to describe the limitations of floating point arithmetic so it can be properly
used. Normal algebraic rules apply only to infinite precision arithmetic.Let us consider the
simple statement x:=x+1, x is an integer. On any modern computer this statement follows the
-
7/28/2019 Project Report Multiplication
21/32
normal rules of algebra as long as overflow does not occur. That is, this statement is valid only
for certain values of x (minint
-
7/28/2019 Project Report Multiplication
22/32
smaller number, obtaining 1.68e1 which is even less correct. Extra digits available during a
computation are known as guard digits (or guard bits in the case of a binary format). They
greatly enhance accuracy during a long chain of computations.
The accuracy loss during a single computation usually isnt enough to worry about
unless we are greatly concerned about the accuracy of our computations. However, we compute
a value which is the result of a sequence of floating point operations, the error can accumulate
and greatly affect the computation itself. For example, suppose we
were to add 1.23e3 with 1.00e0. Adjusting the numbers so their exponents are the same before
the addition produces 1.23e3 + 0.001e3. The sum of these two values, even after rounding, is
1.23e3. This might seem perfectly reasonable; after all, we can only maintain three significant
digits, adding in a small value shouldnt affect the result at all.
However, suppose we were to add 1.00e0 1.23e3 ten times. The first time we add 1.00e0 to
1.23e3 we get 1.23e3. Likewise, we get this same result the second, third, fourth. and tenth
time we add 1.00e0 to 1.23e3. On the other hand, had we added 1.00e0 to itself ten times, then
added the result (1.00e1) to 1.23e3, we would have gotten a different result, 1.24e3. This is the
most important thing to know about limited precision arithmetic:
The order of evaluation can effect the accuracy of the result.
We can get more accurate results if the relative magnitudes (that is, the exponents) are close to
one another. Whenever a chain calculation involving addition and subtraction is being
perfomed, it should be attempted to group the values appropriately. Another problem with
addition and subtraction is that you can wind up with false precision. Consider the computation
1.23e0 - 1.22 e0. This produces 0.01e0. Although this is mathematically equivalent to 1.00e-2,
this latter form suggests that the last two digits are exactly zero. Unfortunately, weve only got a
single significant digit at this time. Indeed, some FPUs or floating point software packages might
actually insert random digits (or bits) into the least significant positions. This brings up a second
important rule concerning limited precision arithmetic:
Whenever subtracting two numbers with the same signs or adding two numbers with different
signs, the accuracy of the result may be less than the precision available in the floating point
format. Multiplication and division do not suffer from the same problems as addition and
subtraction since we do not have to adjust the exponents before the operation; all we need to do
is add the exponents and multiply the mantissas (or subtract the exponents and divide the
-
7/28/2019 Project Report Multiplication
23/32
mantissas). By themselves, multiplication and division do not produce particularly poor results.
However, they tend to multiply any error which already exists in a value. For example, if we
multiply 1.23e0 by two, when we should be multiplying 1.24e0 by two, the result is even less
accurate. This brings up a third important rule when working with limited precision arithmetic,
When performing a chain of calculations involving addition, subtraction, multiplication,
and division, try to perform the multiplication and division operations first.
Often, by applying normal algebraic transformations, we can arrange a calculation so the
multiply and divide operations occur first. For example, suppose we want to compute x*(y+z).
Normally we would add y and z together and multiply their sum by x. However, we can get a
little more accuracy if we transform x*(y+z) to get x*y+x*z and compute the result by
performing the multiplications first. Multiplication and division are not without their own
problems. When multiplying two very large or very small numbers, it is quite possible for
overflow or underflow to occur. The same situation occurs when dividing a small number by a
large number or dividing a large number by a small number. This brings up a fourth rule we
should attempt to follow when multiplying or dividing values:
When multiplying and dividing sets of numbers, try to arrange the multiplications
so that they multiply large and small numbers together; likewise, try to divide numbers that have
the same relative magnitudes.
Comparing floating pointer numbers is very dangerous. Given the inaccuracies present in any
computation (including converting an input string to a floating point value), two floating point
values should never be compared to see if they are equal. In a binary floating point format,
different computations which produce the same (mathematical) result may differ in their least
significant bits. For example, adding 1.31e0+1.69e0 should produce 3.00e0. Likewise, adding
2.50e0+1.50e0 should produce 3.00e0. However, were you to compare (1.31e0+1.69e0) agains
(2.50e0+1.50e0) we might find out that these sums are not equal to one another. The test for
equality succeeds if and only if all bits (or digits) in the two operands are exactly the same. Since
this is not necessarily true after two different floating point computations which should produce
the same result, a straight test for equality may not work.
The standard way to test for equality between floating point numbers is to determine how much
error (or tolerance) you will allow in a comparison and check to see if one value is within this
error range of the other. The straight-forward way to do this is to use a
-
7/28/2019 Project Report Multiplication
24/32
test like the following:
if Value1 >= (Value2-error) and Value1
-
7/28/2019 Project Report Multiplication
25/32
binary floating-point numberxis represented as a significand and an exponent, x = s* 2e.
The formula
(s1 *2e1) (s2 *2e2) = (s1 s2) *2e1+e2
Shows that a floating-point multiply algorithm has several parts. The first part multiplies
the significands using ordinary integer multiplication. Because floating point numbers are
stored in sign magnitude form, the multiplier need only deal with unsigned numbers
(although we have seen that Booth recoding handles signed twos complement numbers
painlessly). The second part rounds the result. If the significands are unsigned p-bit
numbers (e.g.,p = 24 for single precision), then the product can have as many as 2p bits and
must be rounded to ap-bit number. The third part computes the new exponent.
Because exponents are stored with a bias, this involves subtracting the bias from the sum
of the biased exponents.
Example
How does the multiplication of the single-precision numbers
1 10000010 000. . . = 1* 23
0 10000011 000. . . = 1* 24
Proceed in binary?
Answer
When unpacked, the significands are both 1.0, their product is 1.0, and so the
result is of the form
1 ???????? 000. . .
To compute the exponent, use the formula
Biased exp (e1 + e2) = biased exp(e1) + biased exp(e2) bias
The bias is 127 = 011111112, so in twos complement 127 is 100000012. Thus, the biased
exponent of the product is10000010
10000011
+ 10000001
10000110
-
7/28/2019 Project Report Multiplication
26/32
Since this is 134 decimal, it represents an exponent of 134 bias = 134 127 = 7, as
expected.
The interesting part of floating-point multiplication is rounding. Since the cases are similar
in all bases, the figure uses human-friendly base 10, rather than base 2.
For floating point number multiplication its necessary to know about floating point number
addition. As while performing floating point multiplication we have to perform addition anyhow
to get the final result. So in performing addition there may be some carry generated, for which
we have to renormalize it in which it may lose its precision bits. For that we have to take three
extra bits guard, round and sticky. Hence, its much important to know about how addition
occurs and then multiplication. The next page contains, how addition is done and what all
procedures are obtained in order to get the final result.
Chapter 3
3.ADDITION ALGORITHM
-
7/28/2019 Project Report Multiplication
27/32
Let a1 and a2 be the two numbers to be added. The notations ei and si are used for the
exponent and significant of the addends ai. This means that the floating-point inputs
have been unpacked and that si has an explicit leading bit. To add a1 and a2, perform
these eight steps:1. If e1 < e2, swap the operands. This ensures that the difference of the exponents
satisfies d = e1e2 0. Tentatively set the exponent of the result to e1.
2. If the sign of a1 and a2 differ, replace s2 by its twos complement.
3. Place s2 in ap-bit register and shift it d = e1-e2 places to the right (shifting in 1s if the
s2 was complemented in previous step). From the bits shifted out, set g to the most-
significant bit, rto the next most-significant bit, and set sticky bit s to the OR of the rest.
4. Compute a preliminary significant S = s1+s2 by adding s1 to the p-bit register
containing s2. If the signs of a1 and a2 are different, the most-significant bit of S is 1, and
there was no carry out then S is negative. Replace S with its twos complement. This can
only happen when d = 0.
5. Shift S as follows. If the signs of a1 and a2 are same and there was a carry out in step
4, shift S right by one, filling the high order position with one (the carry out). Otherwise
shift it left until it is normalized. When left shifting, on the first shift fill in the low order
position with the g bit. After that, shift in zeros. Adjust the exponent of the result
accordingly.
6. Adjust rand s. If S was shifted right in step 5, set r: = low order bit of S before shifting
and s: = g or ror s. If there was no shift, set r: =g, s: = r. If there was a single left shift,
dont change rand s. If there were two or more left shifts, set r: = 0, s: = 0. (In the last
case, two or more shifts can only happen when a1 and a2 have opposite signs and thesame exponent, in which case the computation s1 + s2 in step 4 will be exact.)
7. Compute the sign of the result. If a1 and a2 have the same sign, this is the sign of the
result.
-
7/28/2019 Project Report Multiplication
28/32
Ifa1 and a2 have different signs, then the sign of the result depends on which ofa1, a2 is
negative, whether there was a swap in the step 1 and whether Swas replaced by its twos
complement in step 4.
3.1 ABOUT FLOATING POINT ARITHMETIC
Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication
and division the operations are done with algorithms similar to those used on sign magnitude
-
7/28/2019 Project Report Multiplication
29/32
integers (because of the similarity of representation) -- example, only add numbers of the same
sign. If the numbers are of opposite sign, must do subtraction.
ADDITION
Example on decimal value given in scientific notation:
3.25 x 10 ** 3
+ 2.63 x 10 ** -1
-----------------
first step: align decimal points
second step: add
3.25 x 10 ** 3
+ 0.000263 x 10 ** 3
--------------------
3.250263 x 10 ** 3
(presumes use of infinite precision, without regard for accuracy)
third step: normalize the result (already normalized!)
example on fl pt. value given in binary:
.25 = 0 01111101 00000000000000000000000
100 = 0 10000101 10010000000000000000000
to add these fl. pt. representations,
step 1: align radix points
shifting the mantissa LEFT by 1 bit DECREASES THE EXPONENT by 1
shifting the mantissa RIGHT by 1 bit INCREASES THE EXPONENT by 1
we want to shift the mantissa right, because the bits that fall off the end should come from theleast significant end of the mantissa
> we choose to shift the .25, since we want to increase it's exponent.-> shift by 10000101
-01111101
---------
-
7/28/2019 Project Report Multiplication
30/32
00001000 (8) places.
0 01111101 00000000000000000000000 (original value)
0 01111110 10000000000000000000000 (shifted 1 place
(note that hidden bit is shifted into msb of mantissa)
0 01111111 01000000000000000000000 (shifted 2 places)
0 10000000 00100000000000000000000 (shifted 3 places)
0 10000001 00010000000000000000000 (shifted 4 places)
0 10000010 00001000000000000000000 (shifted 5 places)
0 10000011 00000100000000000000000 (shifted 6 places)
0 10000100 00000010000000000000000 (shifted 7 places)
0 10000101 00000001000000000000000 (shifted 8 places)
step 2: add ( hidden bit for the 100 shouldnt be forgotten)
0 10000101 1.10010000000000000000000 (100)
+ 0 10000101 0.00000001000000000000000 (.25)
---------------------------------------
0 10000101 1.10010001000000000000000
step 3: normalize the result (get the "hidden bit" to be a 1)
it already is for this example.
result is: 0 10000101 10010001000000000000000
conclusion
-
7/28/2019 Project Report Multiplication
31/32
For floating point number multiplication its necessary to know about floating
point number addition. As while performing floating point multiplication we have
to perform addition anyhow to get the final result. So in performing addition
there may be some carry generated, for which we have to renormalize it in which
it may lose its precision bits. For that we have to take three extra bits guard,
round and sticky. Its much important to know about how addition occurs and
then multiplication. This presentation contained how addition is done and what
all procedures are obtained in order to get the final result. Now we have studied
and gathered idea about floating point addition which will be helpful for us while
doing the multiplication part in our next semester as our major project.
-
7/28/2019 Project Report Multiplication
32/32
references
Liang-Kai Wang and Michael J. Schulte Decimal Floating-Point Adder andMultifunction Unit with Injection-Based Rounding
Department of Electrical and Computer Engineering18th IEEE Symposium on Computer
Arithmetic(ARITH'07)
G. Even and P. M. Seidel. A comparison of three roundingalgorithms for IEEE floating-point multiplication. IEEE Transactions on Computers,
49(7), July 2000
.
N. Burgess. Renormalizations rounding in IEEE floating pointOperations using a flagged prefix adder. IEEE Transactions on VLSI System,
13(2):266277, Feb 2005 .
IEEE Standard for Floating-Point Arithmetic IEEE 3 Park Avenue New York, NY 10016-5997, USA 29 August 2008 IEEE Computer Society Sponsored by the Microprocessor
Standards Committee.
M. S. Schmookler and A. W. Weinberger. High speed decimal additionIEEE Transactions on Computers, C-20:862867, Aug 1971.