Project Report Multiplication

download Project Report Multiplication

of 32

Transcript of Project Report Multiplication

  • 7/28/2019 Project Report Multiplication

    1/32

    CONTENTS

    ABSTRACT LIST OF SYMBOLS AND ACRONYMS VHDL INTRODUCTION TO FLOATING POINT NUMBER INTRODUCTION HISTORY RANGE OF FPN FPN PRECISION IEEE 754 FPN STANDARD FPN REPRESENTATION COMPUTER REPRESENTATION IEEE FPN REPRESENTATION ATTRIBUTES & ROUNDING FPN ARITHMETIC FPN REPRESENTATION FORMAT PARAMETERS FOR THE IEEE 754 FLOATING-POINT STANDARD FPN MULTIPLICATION DENORMALS FPN MULTIPLICATION ALGORITHM HARDWARE OF FLOATING POINT MULTIPLIER UNSIGNED MULTIPLIER ADDITION PROCESS NORMALIZER UNDERFLOW/OVERFLOW DETECTION MULTIPLICATION FLOWCHART

  • 7/28/2019 Project Report Multiplication

    2/32

    STRUCTURE OF MULTIPLICATION FLOATING POINT MULTIPLIER ARCHITECTURE PROPOSED CIFM ARCHITECTURE Real-life application Optimization criteria Application APPENDIX CONCLUSION REFERENCES

  • 7/28/2019 Project Report Multiplication

    3/32

    FLOATING POINT MULTIPLICATION USING VHDL

    A report submitted in partial fu lf il lment of the requirements for the Degree

    of

    Bachelor of Technology

    in

    Electronincs and Communication Engineering

    Under the Guidance of

    Manas Ranjan Tripathy

    Department of Electronics and Communication Engineering

    INSTITUTE OF TECHNICAL EDUCATION & RESEARCH, BHUBANESWAR

    (SIKSHA O ANUSANDHAN UNIVERSITY, ODISHA)

    2012

    Submitted by:Bibhu bhusHan panda (0911016214)

    Sadbhab patra (0911016231)Chandrakanta parida (1021016041)

    Sweta chandan(0911016244)

  • 7/28/2019 Project Report Multiplication

    4/32

    INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH

    CERTIFICATE

    This is to certif y that the project titled, FLOATING POINT MULTIPLICATION USINGVHDL is the a bonafide work of group C4, in partial ful fi llment for the award of

    Degree Bachelor of Technology in E lectronics and communication Engineer ing,

    conducted under my supervision.

    Project guide:

    Mr. Manas Ranjan Tripathy

    (Lecturer)

    Department of Electronics and communication Engineering

    ITER, BHUBANESWAR

  • 7/28/2019 Project Report Multiplication

    5/32

    DECLARATION

    We certify that

    a. The work contained in this report is original and has been done by us under the guidance of

    our supervisor.

    b. The work has not been submitted to any other Institute for any degree or diploma.

    c. We have followed the guidelines provided by the Institute in preparing the report.

    d. We have conformed to the norms and guidelines given in the Ethical Code of Conduct of the

    Institute.

    e. Whenever we have used materials (data, theoretical analysis, figures, and text) from other

    sources, we have given due credit to them by citing them in the text of the report and giving their

    details in the reference. Further, we have taken permission from the copyright owners of the

    sources, whenever necessary.

    BIBHU BHUSHAN PANDA (0911016214)

    SADBHAB PATRA (0911016231)

    CHANDRAKANTA PARIDA (1021016041)

    SWETA CHANDAN(0911016244)

  • 7/28/2019 Project Report Multiplication

    6/32

    ACKNOWLEDGMENT

    We would like to thank Mr. Manas Ranjan Tripathy for providing us this

    opportunity to present the project on FLOATING POINT

    MULTIPLICATION USING VHDL

    We would like to thank Prof. Mr.Bibhu Prasad Mohanty(HOD), Prof Dr

    Niva Das (Associate Dean) & Mr.Manas Ranjan Tripathyfor his constant supportand guidance. We would also extend my gratitude to the faculty and staff of

    Department Electronics and communication engineering, for their valuable

    insights which made this project a success.

    Lastly, we would thank one and all, who helped building this project and

    guided us in all aspects to its success.

    BIBHU BHUSHAN PANDA (0911016214)

    SADBHAB PATRA (0911016231)

    CHANDRAKANTA PARIDA (1021016041)

    SWETA CHANDAN(0911016244)

  • 7/28/2019 Project Report Multiplication

    7/32

    ABSTRACT

    Shrinking feature sizes gives more headroom for designers to extend the functionality of

    microprocessor. As processor support for decimal floating-point arithmetic emerges, it isimportant to investigate efficient algorithms and hardware designs for common decimal floating-

    point arithmetic algorithms.

    This paper presents designs for a decimal floating-point adder . floating-point arithmetic is

    usually sufficient for scientific and statistics applications. However, it is not sufficient for many

    commercial applications and database systems, in which operations often need to mirror manual

    calculations. Therefore, these applications often use software to perform decimal floating-point

    arithmetic operations .

    This standard provides a method for computation with floating-point numbers that will yield the

    same result whether the processing is done in hardware, software, or a combination of the two.

    The results of the computation will be identical, independent of implementation, given the same

    input data. Errors, and error conditions, in the mathematical processing will be reported in a

    consistent manner regardless of implementation.

    Keywords:- exponent,normalized value,subnormal,subnormal numbers.

  • 7/28/2019 Project Report Multiplication

    8/32

    LIST OF SYMBOLS

    LIST OF ACRONYMS

    Serial No. Symbols Meaning1 X Real Number

    2 M Significant

    3 E Exponent

    Serial No. Acronyms Meaning1 OFL Overflow level

    2 UFL Underflow Level

    3 NAN Not a Number

  • 7/28/2019 Project Report Multiplication

    9/32

    Chapter 1

    1. VHDLThe VHSIC (very high speed integrated circuits) Hardware Description Language (VHDL)

    was first proposed in 1981. The development of VHDL was originated by IBM, Texas

    Instruments, and Inter-metrics in 1983. The result, contributed by many participating EDA

    (Electronics Design Automation) groups, was adopted as the IEEE 1076 standard in

    December 1987. VHDL is intended to provide a tool that can be used by the digital systems

    community to distribute their designs in a standard format. Using VHDL, they are able to

    talk to each other about their complex digital circuits in a common language withoutdifficulties of revealing technical details.

    As a standard description of digital systems, VHDL is used as input and output to various

    simulation, synthesis, and layout tools. The language provides the ability to describe

    systems, networks, and components at a very high behavioral level as well as very low gate

    level. It also represents a top-down methodology and environment.

    Simulations can be carried out at any level from a generally functional analysis to a very

    detailed gate-level wave form analysis.

    1.1 INTRODUCTION TO FLOATING POINT NUMBERS

    1. INTRODUCTION

    In computing, floating point describes a method of representing real numbers in a way that can

    support a wide range of values. Numbers are, in general, represented approximately to a fixed

    number ofsignificant digits and scaled using an exponent. The base for the scaling is normally 2,

    10 or 16. The typical number that can be represented exactly is of the form:

    Significant digits baseexponent

    http://en.wikipedia.org/wiki/Computinghttp://en.wikipedia.org/wiki/Real_numberhttp://en.wikipedia.org/wiki/Significant_figureshttp://en.wikipedia.org/wiki/Exponentiationhttp://en.wikipedia.org/wiki/Exponentiationhttp://en.wikipedia.org/wiki/Significant_figureshttp://en.wikipedia.org/wiki/Real_numberhttp://en.wikipedia.org/wiki/Computing
  • 7/28/2019 Project Report Multiplication

    10/32

    The term floating point refers to the fact that the radix point (decimal point, or, more commonly

    in computers, binary point) can "float"; that is, it can be placed anywhere relative to the

    significant digits of the number. This position is indicated separately in the internal

    representation, and floating-point representation can thus be thought of as a computer realization

    ofscientific notation. Over the years, a variety of floating-point representations have been used

    in computers. However, since the 1990s, the most commonly encountered representation is that

    defined by the IEEE 754 Standard.

    The advantage of floating-point representation overfixed-point and integerrepresentation is that

    it can support a much wider range of values. For example, a fixed-point representation that has

    seven decimal digits with two decimal places can represent the numbers 12345.67, 123.45, 1.23

    and so on, whereas a floating-point representation (such as the IEEE 754 decimal32 format) with

    seven decimal digits could in addition represent 1.234567, 123456.7, 0.00001234567,

    1234567000000000, and so on. The floating-point format needs slightly more storage (to encode

    the position of the radix point), so when stored in the same space, floating-point numbers achieve

    their greater range at the expense ofprecision.

    i.History

    Leonardo Torres y Quevedo in 1914 designed an electro-mechanical version of the Analytical

    Engine ofCharles Babbage which included floating-point arithmetic. In 1938, Konrad Zuse of

    Berlin completed the Z1, the first mechanical binary programmable computer, this was however

    unreliable in operation. It worked with 22-bit binary floating-point numbers having a 7-bit signed

    exponent, a 15-bit significand (including one implicit bit), and a sign bit. The memory used

    sliding metal parts to store 64 words of such numbers. The relay-based Z3, completed in 1941

    had representations for plus and minus infinity. It implemented defined operations with infinity

    such as 1/ = 0 and stopped on undefined operations like 0. It also implemented the square

    root operation in hardware. Konrad Zuse, architect of the first programmable computer, which

    used 22-bit binary floating point.

    Zuse also proposed, but did not complete, carefully rounded floatingpoint arithmetic that would

    have included and NaNs, anticipating features of IEEE Standard floatingpoint by four

    http://en.wikipedia.org/wiki/Radix_pointhttp://en.wikipedia.org/wiki/Scientific_notationhttp://en.wikipedia.org/wiki/IEEE_754http://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Integer_(computer_science)http://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Accuracy_and_precisionhttp://en.wikipedia.org/wiki/Leonardo_Torres_y_Quevedohttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Charles_Babbagehttp://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Z1_%28computer%29http://en.wikipedia.org/wiki/Relayhttp://en.wikipedia.org/wiki/Z3_%28computer%29http://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Z3_%28computer%29http://en.wikipedia.org/wiki/Relayhttp://en.wikipedia.org/wiki/Z1_%28computer%29http://en.wikipedia.org/wiki/Konrad_Zusehttp://en.wikipedia.org/wiki/Charles_Babbagehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Analytical_Enginehttp://en.wikipedia.org/wiki/Leonardo_Torres_y_Quevedohttp://en.wikipedia.org/wiki/Accuracy_and_precisionhttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Integer_(computer_science)http://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/IEEE_754http://en.wikipedia.org/wiki/Scientific_notationhttp://en.wikipedia.org/wiki/Radix_point
  • 7/28/2019 Project Report Multiplication

    11/32

    decades. By contrast, von Neumann recommended against floating point for the 1951 IAS

    machine, arguing that fixed point arithmetic was preferable.

    The first commercial computer with floating point hardware was Zuse's Z4 computer designed in

    19421945. The Bell Laboratories Mark V computer implemented decimal floating point in

    1946.

    Prior to the IEEE-754 standard, computers used many different forms of floating-point. These

    differed in the word sizes, the format of the representations, and the rounding behavior of

    operations. These differing systems implemented different parts of the arithmetic in hardware

    and software, with varying accuracy.

    The IEEE-754 standard was created in the early 1980s after word sizes of 32 bits (or 16 or 64)

    had been generally settled upon. This was based on a proposal from Intel who were designing the

    i8087 numerical coprocessor. Prof. W. Kahan was the primary architect behind this proposal,

    along with his student Jerome Coonen at U.C. Berkeley and visiting Prof. Harold Stone, for

    which he was awarding the 1989 Turing award. Among the innovations are these:

    A precisely specified encoding of the bits, so that all compliant computers wouldinterpret bit patterns the same way. This made it possible to transfer floating-point

    numbers from one computer to another.

    A precisely specified behavior of the arithmetic operations: arithmetic operations wererequired to be correctly rounded, i.e. to give the same result as if infinitely precise

    arithmetic was used and then rounded. This meant that a given program, with given data,

    would always produce the same result on any compliant computer. This helped reduce

    the almost mystical reputation that floating-point computation had for seemingly

    nondeterministic behavior.

    The ability of exceptional conditions (overflow, divide by zero, etc.) to propagate througha computation in a benign manner and be handled by the software in a controlled way.

    http://en.wikipedia.org/wiki/John_von_Neumannhttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/Z4_%28computer%29http://en.wikipedia.org/wiki/IEEE-754http://en.wikipedia.org/wiki/Intel_8087http://en.wikipedia.org/wiki/William_Kahanhttp://en.wikipedia.org/wiki/William_Kahanhttp://en.wikipedia.org/wiki/Intel_8087http://en.wikipedia.org/wiki/IEEE-754http://en.wikipedia.org/wiki/Z4_%28computer%29http://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/IAS_machinehttp://en.wikipedia.org/wiki/John_von_Neumann
  • 7/28/2019 Project Report Multiplication

    12/32

    ii.Range of floating-point numbers

    By allowing the radix point to be adjustable, floating-point notation allows calculations over a

    wide range of magnitudes, using a fixed number of digits, while maintaining good precision. For

    example, in a decimal floating-point system with three digits, the multiplication that humans

    would write as

    0.12 0.12 = 0.0144

    would be expressed as

    (1.20101

    ) (1.20101

    ) = (1.44102

    ).

    In a fixed-point system with the decimal point at the left, it would be

    0.120 0.120 = 0.014.

    A digit of the result was lost because of the inability of the digits and decimal point to 'float'

    relative to each other within the digit string.

    The range of floating-point numbers depends on the number of bits or digits used for

    representation of the significand (the significant digits of the number) and for the exponent. On a

    typical computer system, a 'double precision' (64-bit) binary floating-point number has acoefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit. Positive

    floating-point numbers in this format have an approximate range of 10308

    to 10308

    , because the

    range of the exponent is [1022,1023] and 308 is approximately log10(21023

    ). The complete range

    of the format is from about 10308

    through +10308

    .

    The number of normalized floating point numbers in a system F (B, P, L, U) (where B is the base

    of the system, P is the precision of the system to P numbers, L is the smallest exponent

    representable in the system, and U is the largest exponent used in the system) is: .

    There is a smallest positive normalized floating-point number, Underflow level = UFL = which

    has a 1 as the leading digit and 0 for the remaining digits of the significand, and the smallest

    possible value for the exponent.

    http://en.wikipedia.org/wiki/Radix_pointhttp://en.wikipedia.org/wiki/Radix_point
  • 7/28/2019 Project Report Multiplication

    13/32

    There is a largest floating point number, Overflow level = OFL = which has B 1 as the

    value for each digit of the significand and the largest possible value for the exponent.

    In addition there are representable values strictly between UFL and UFL. Namely, zero and

    negative zero, as well as subnormal numbers.

    iii.Floating-point precisions

    IEEE 754:

    16-bit: Half (binary16)

    32-bit: Single (binary32), decimal32

    64-bit: Double (binary64), decimal64

    128-bit: Quadruple (binary128), decimal128

    Extended precision formats

    Other: Minifloat Arbitrary precision

    The IEEE has standardized the computer representation for binary floating-point numbers in

    IEEE 754 (aka. IEC 60559). This standard is followed by almost all modern machines. Notable

    exceptions include IBM mainframes, which support IBM's own format (in addition to the IEEE754 binary and decimal formats), and Cray vector machines, where the T90 series had an IEEE

    version, but the SV1 still uses Cray floating-point format.

    The standard provides for many closely related formats, differing in only a few details. Five of

    these formats are called basic formats and others are termed extended formats, and three of these

    are especially widely used in computer hardware and languages:

    Single precision, called "float" in the C language family, and "real" or "real*4" inFortran. This is a binary format that occupies 32 bits (4 bytes) and its significand has a

    precision of 24 bits (about 7 decimal digits).

    Double precision, called "double" in the C language family, and "double precision" or"real*8" in Fortran. This is a binary format that occupies 64 bits (8 bytes) and its

    significand has a precision of 53 bits (about 16 decimal digits).

    http://en.wikipedia.org/wiki/Subnormal_numbershttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Precision_%28computer_science%29http://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/Half-precision_floating-point_formathttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/Single-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Double-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal64_floating-point_formathttp://en.wikipedia.org/wiki/128-bithttp://en.wikipedia.org/wiki/Quadruple-precision_floating-point_formathttp://en.wikipedia.org/wiki/Decimal128_floating-point_formathttp://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Minifloathttp://en.wikipedia.org/wiki/Arbitrary-precision_arithmetichttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecturehttp://en.wikipedia.org/wiki/Crayhttp://en.wikipedia.org/wiki/Cray_T90http://en.wikipedia.org/wiki/Cray_SV1http://en.wikipedia.org/wiki/Single_precisionhttp://en.wikipedia.org/wiki/C_%28programming_language%29http://en.wikipedia.org/wiki/Fortranhttp://en.wikipedia.org/wiki/Double_precisionhttp://en.wikipedia.org/wiki/Double_precisionhttp://en.wikipedia.org/wiki/Fortranhttp://en.wikipedia.org/wiki/C_%28programming_language%29http://en.wikipedia.org/wiki/Single_precisionhttp://en.wikipedia.org/wiki/Cray_SV1http://en.wikipedia.org/wiki/Cray_T90http://en.wikipedia.org/wiki/Crayhttp://en.wikipedia.org/wiki/IBM_Floating_Point_Architecturehttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Arbitrary-precision_arithmetichttp://en.wikipedia.org/wiki/Minifloathttp://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Decimal128_floating-point_formathttp://en.wikipedia.org/wiki/Quadruple-precision_floating-point_formathttp://en.wikipedia.org/wiki/128-bithttp://en.wikipedia.org/wiki/Decimal64_floating-point_formathttp://en.wikipedia.org/wiki/Double-precision_floating-point_formathttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Decimal32_floating-point_formathttp://en.wikipedia.org/wiki/Single-precision_floating-point_formathttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/Half-precision_floating-point_formathttp://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/IEEE_754-2008http://en.wikipedia.org/wiki/Precision_%28computer_science%29http://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Subnormal_numbers
  • 7/28/2019 Project Report Multiplication

    14/32

    Double extended format, 80-bit floating point value. This is implemented on mostpersonal computers but not on other devices. Sometimes "long double" is used for this in

    the C language family (the C99 and C11 standards "IEC 60559 floating-point arithmetic

    extension- Annex F" recommend the 80-bit extended format to be provided as "long

    double" when available), though "long double" may be a synonym for "double" or may

    stand for quadruple precision. Extended precision can help minimise accumulation of

    round-off errorin intermediate calculations.

    Any integer with absolute value less than or equal to 224

    can be exactly represented in the single

    precision format, and any integer with absolute value less than or equal to 253

    can be exactly

    represented in the double precision format. Furthermore, a wide range of powers of 2 times such

    a number can be represented. These properties are sometimes used for purely integer data, to get

    53-bit integers on platforms that have double precision floats but only 32-bit integers. To a rough

    approximation, the bit representation of an IEEE binary floating-point number is proportional to

    its base 2 logarithm.

    Chapter 2

    http://en.wikipedia.org/wiki/Extended_precisionhttp://en.wikipedia.org/wiki/Long_doublehttp://en.wikipedia.org/wiki/C99http://en.wikipedia.org/wiki/C11_%28C_standard_revision%29http://en.wikipedia.org/wiki/Round-off_errorhttp://en.wikipedia.org/wiki/Round-off_errorhttp://en.wikipedia.org/wiki/C11_%28C_standard_revision%29http://en.wikipedia.org/wiki/C99http://en.wikipedia.org/wiki/Long_doublehttp://en.wikipedia.org/wiki/Extended_precision
  • 7/28/2019 Project Report Multiplication

    15/32

    2. IEEE-754 FLOATING-POINT STANDARDIn the early days of digital computers, it was quite common that machines from different

    vendors have different word lengths and unique floating-point formats. This caused many

    problems, especially in the porting of programs between different machines (designs). A main

    objective in developing such a standard, floating-point representation standard is to make

    numerical programs predictable and completely portable, in the sense of producing identical

    results when run on different machines.The IEEE-754 floating-point standard, formally named

    ANSI/IEEE Std 754-1985, introduced in 1985 tried to solve these problems. Our main

    objective for this standard is that an implementation of a floating-point system confirming to

    this standard can be realized in software, entirely in hardware, or in any combination of

    software and hardware. The standard specifies two formats for floating-point numbers, basic

    (single precision) and extended(double precision), it also specifies the basic operations for both

    formats which are addition and subtraction of operations. Finally, it describes the different

    floating-point exceptions and their handling, including non-numbers (NaNs).

    Table 1: Features of the ANSI/IEEE Standard Floating-Point Representation

    Feature Single Double

    Word length, bits 32 64

    Significant bits 23+1(hidden) 52+1(hidden)

    Significant Range [1,2-2-23] [1,2-2-52]

    Exponent Bits 8 11

    Exponent Bias 127 1023

    Zero ( 0) E + bias = 0, f = 0 e + bias = 0, f = 0

    Denormal E + bias = 0, f 0 e + bias = 0, f 0

    Infinity () E + bias = 255, f = 0 e + bias = 2047, f = 0

    Not-a-Number (NAN) E + bias = 255, f 0 e + bias = 2047, f 0

    Minimum 2-126 1.2 10 -38 2-1023 1.2 10 -308

    Maximum 2128 3.4 10 38 21024 1.8 10 308

    PROBLEMS ASSOCIATED WITH FLOATING POINT ADDITION

  • 7/28/2019 Project Report Multiplication

    16/32

    For the input the exponent of the number may be dissimilar. And dissimilar exponent cant be

    added directly. So the first problem is equalizing the exponent. To equalize the exponent the

    smaller number must be increased until it equals to that of the larger number. Then significant

    are added. Because of fixed size of mantissa and exponent of the floating-point number cause

    many problems to arise during addition and subtraction. The second problem associated with

    overflow of mantissa. It can be solved by using the rounding of the result. The third problem is

    associated with overflow and underflow of the exponent. The former occurs when mantissa

    overflow and an adjustment in the exponent is attempted the underflow can occur while

    normalizing a small result. Unlike the case in the fixed-point addition, an overflow in the

    mantissa is not disabling; simply shifting the mantissa and increasing the exponent can

    compensate for such an overflow. Another problem is associated with normalization of addition

    and subtraction. The sum or difference of two significant may be a number, which is not in

    normalized form. So it should be normalized before returning results.

    2.1 Floating Point Representation

    i. Computer Representation of Numbers

    Computers which work with real arithmetic use a system called floating point.

    Suppose a real number x has the binary expansion

    X = , 1 and

    m = (

    To store a number in floating point representation, a computer word is divided into 3 fields,

    representing the sign, the exponent E, and the significand m respectively. A 32-bit word could

    be divided into fields as follows: 1 bit for the sign, 8 bits for the exponent and 23 bits for the

    significand. Since the exponent field is 8 bits, it can be used to represent exponents between -128

    and 127. The significand field can store the first 23 bits of the binary representation of m, namely

  • 7/28/2019 Project Report Multiplication

    17/32

    FORMATS:- This defines floating-point formats, which are used to represent a finite subset of

    real numbers Formats are characterized by their radix, precision, and exponent range, and each

    format can represent

    a unique set of floating-point data . All formats can be supported as arithmetic formats; that is,

    they may be used to represent floating-point operands.Specific fixed-width encodings for binary

    and decimal formats are defined in this clause for a subset of the formats . These interchange

    formats are identified by their size and can be used for the exchange of floating-point data

    between implementations.

    Five basic formats are defined : Three binary formats, with encodings in lengths of 32, 64, and

    128 bits . Two decimal formats, with encodings in lengths of 64 and 128 bits.

    Additional arithmetic formats are recommended for extending these basic formats . The choice

    of which of this standards formats to support is language-defined or, if the relevant language

    standard is silent or defers to the implementation, implementation-defined. The names used for

    formats in this standard are not necessarily those used in programming environments.

    ii. IEEE Floating Point Representation

    In the 1960's and 1970's, each computer manufacturer developed its own floating point system,

    leading to a lot of inconsistency as to how the same program behaved on different machines. Forexample, although most machines used binary floating point systems, the IBM 360/370 series,

    which dominated computing during this period, used a hexadecimal base, i.e. numbers were

    represented as m . Other machines, such as HP calculators,used a decimal floating point

    system. Through the efforts of several computer scientists, particularly W.Kahan, a binary

    floating point standard was developed in the early 1980's and, most importantly, followed very

    carefully by the principal manufacturers of floating point chips for personal computers,namely

    Intel and Motorola. This standard has become known as the IEEE floating point standard since it

    was developed and endorsed by a working committee of the Institute for Electrical and

    Electronics Engineers.

    The IEEE standard has three very important requirements:

    -- _ consistent representation of floating point numbers across all machines adopting the standard

  • 7/28/2019 Project Report Multiplication

    18/32

    -- correctly rounded arithmetic

    --consistent and sensible treatment of exceptional situations such as division by zero

    We start with the following observation. In the last section, we chose to normalize a nonzero

    number x so that x = m , where 1 m < 2, i.e.

    m = ( ..)2

    with b0 = 1. In the simple floating point model , we stored the leading nonzero bit b0 in the first

    position of the field provided for m. Note, however, that since we know this bit has the value

    one, it is not necessary to store it. Consequently, we can use the 23 bits of the significand field

    to store b1.b2 b23 instead of b0.b1.. b22, changing the machine precision from = to

    = . Since the bitstring stored in the significand field is now actually the fractional part of

    the significand, we shall

    refer henceforth to the field as the fraction field. Given a string of bits in the fraction field, it is

    necessary to imagine that the symbols 1. appear in front of the string, even though these

    symbols are not stored. This technique is called hidden bit normalization and was used by Digital

    for the Vax machine in the late 1970s.

    iii.Attributes and rounding

    Attribute specification :- An attribute is logically associated with a program block

    to modify its numerical and exception semantics. A user can specify a constant value for an

    attribute parameter. Some attributes have the effect of an implicit parameter to most individual

    operations of this standard;language standards shall specify

    rounding-direction attributes and should specify

    alternate exception handling attributes .

    Other attributes change the mapping of language expressions into operations of this standard;

    language standards that permit more than one such mapping should provide support for:

    preferredWidth attributes

  • 7/28/2019 Project Report Multiplication

    19/32

    value-changing optimization attributes

    reproducibility attributes

    For attribute specification, the implementation shall provide language-defined means, such as

    compiler directives, to specify a constant value for the attribute parameter for all standard

    operations in a block; the scope of the attribute value is the block with which it is associated.

    Language standards shall provide for constant specification of the default and each specific value

    of the attribute.

    Rounding and Correctly Rounded Arithmetic:-

    We use the terminology floating point numbers" to mean all acceptable numbers in a given IEEE

    floating point arithmetic format. This set consists of0, subnormal and normalized numbers,

    and , but not NaN values, and is a finite subset of the reals. We have seen that most real

    numbers, such as 1/10 and pi, cannot be represented exactly as floating point numbers. For

    ease of expression we will say a general real number is normalized if its modulus lies between

    the smallest and largest positive normalized floating point numbers, with a corresponding use of

    the word subnormal. In both cases the representations we give for these numbers will parallel

    the floating point number representations in that b0 = 1 for normalized numbers, and b0 = 0 with

    E = -126 for subnormal numbers.

    For any number x which is not a floating point number, there are two obvious choices for the

    floating point approximation to x: the closest floating point number less than x, and the closest

    floating point number greater than x. The IEEE standard defines the correctly rounded value of

    x, which we shall denote round(x), as follows. If x happens to be a floating point number, then

    round(x) = x. Otherwise, the correctly rounded value depends on which of the following four

    rounding modes is in effect:

    Round downround(x) = x_:

    Round up

    round(x) = x+:

    Round towards zero

    round(x) is either x_ or x+, whichever is between zero and x.

  • 7/28/2019 Project Report Multiplication

    20/32

    Round to nearest

    round(x) is either x_ or x+, whichever is nearer to x. In the case of a tie, the one with its least

    significant bit equal to zero is chosen.

    If x is positive, then x_ is between zero and x, so round down and round towards zero have the

    same effect. If x is negative, then x+ is between zero and x, so it is round up and round towards

    zero which have the same effect. In either case, round towards zero simply requires truncating

    the binary expansion, i.e. discarding bits.

    The most useful rounding mode, and the one which is almost always used, is round to nearest,

    since this produces the floating point number which is closest to x . In the case of toy precision,

    with x = 1=7, it is clear that

    round to nearest gives a rounded value of x equal to 1.75. When the word round is used

    without any qualification, it almost always means round to

    nearest. In the more familiar decimal context, if we round the number pi= 3.14159 to four

    decimal digits, we obtain the result 3.142, which is closer to pi than the truncated result 3.141.

    iv.Floating Point Arithmetic

    Although integers provide an exact representation for numeric values, they suffer from two

    major drawbacks:

    --the inability to represent fractional values

    -- a limited dynamic range.

    Floating point arithmetic solves these two problems at the expense of accuracy and, on some

    processors, speed. Most programmers are aware of the speed loss

    associated with floating point arithmetic; however, they are blithely unware of the problems with

    accuracy.

    For many applications, the benefits of floating point outweigh the disadvantages.

    A big problem with floating point arithmetic is that it does not follow the standard rules of

    algebra. Nevertheless, many programmers apply normal algebraic rules when using floating

    point arithmetic. This is a source of bugs in many programs. One of the primary

    goals of this section is to describe the limitations of floating point arithmetic so it can be properly

    used. Normal algebraic rules apply only to infinite precision arithmetic.Let us consider the

    simple statement x:=x+1, x is an integer. On any modern computer this statement follows the

  • 7/28/2019 Project Report Multiplication

    21/32

    normal rules of algebra as long as overflow does not occur. That is, this statement is valid only

    for certain values of x (minint

  • 7/28/2019 Project Report Multiplication

    22/32

    smaller number, obtaining 1.68e1 which is even less correct. Extra digits available during a

    computation are known as guard digits (or guard bits in the case of a binary format). They

    greatly enhance accuracy during a long chain of computations.

    The accuracy loss during a single computation usually isnt enough to worry about

    unless we are greatly concerned about the accuracy of our computations. However, we compute

    a value which is the result of a sequence of floating point operations, the error can accumulate

    and greatly affect the computation itself. For example, suppose we

    were to add 1.23e3 with 1.00e0. Adjusting the numbers so their exponents are the same before

    the addition produces 1.23e3 + 0.001e3. The sum of these two values, even after rounding, is

    1.23e3. This might seem perfectly reasonable; after all, we can only maintain three significant

    digits, adding in a small value shouldnt affect the result at all.

    However, suppose we were to add 1.00e0 1.23e3 ten times. The first time we add 1.00e0 to

    1.23e3 we get 1.23e3. Likewise, we get this same result the second, third, fourth. and tenth

    time we add 1.00e0 to 1.23e3. On the other hand, had we added 1.00e0 to itself ten times, then

    added the result (1.00e1) to 1.23e3, we would have gotten a different result, 1.24e3. This is the

    most important thing to know about limited precision arithmetic:

    The order of evaluation can effect the accuracy of the result.

    We can get more accurate results if the relative magnitudes (that is, the exponents) are close to

    one another. Whenever a chain calculation involving addition and subtraction is being

    perfomed, it should be attempted to group the values appropriately. Another problem with

    addition and subtraction is that you can wind up with false precision. Consider the computation

    1.23e0 - 1.22 e0. This produces 0.01e0. Although this is mathematically equivalent to 1.00e-2,

    this latter form suggests that the last two digits are exactly zero. Unfortunately, weve only got a

    single significant digit at this time. Indeed, some FPUs or floating point software packages might

    actually insert random digits (or bits) into the least significant positions. This brings up a second

    important rule concerning limited precision arithmetic:

    Whenever subtracting two numbers with the same signs or adding two numbers with different

    signs, the accuracy of the result may be less than the precision available in the floating point

    format. Multiplication and division do not suffer from the same problems as addition and

    subtraction since we do not have to adjust the exponents before the operation; all we need to do

    is add the exponents and multiply the mantissas (or subtract the exponents and divide the

  • 7/28/2019 Project Report Multiplication

    23/32

    mantissas). By themselves, multiplication and division do not produce particularly poor results.

    However, they tend to multiply any error which already exists in a value. For example, if we

    multiply 1.23e0 by two, when we should be multiplying 1.24e0 by two, the result is even less

    accurate. This brings up a third important rule when working with limited precision arithmetic,

    When performing a chain of calculations involving addition, subtraction, multiplication,

    and division, try to perform the multiplication and division operations first.

    Often, by applying normal algebraic transformations, we can arrange a calculation so the

    multiply and divide operations occur first. For example, suppose we want to compute x*(y+z).

    Normally we would add y and z together and multiply their sum by x. However, we can get a

    little more accuracy if we transform x*(y+z) to get x*y+x*z and compute the result by

    performing the multiplications first. Multiplication and division are not without their own

    problems. When multiplying two very large or very small numbers, it is quite possible for

    overflow or underflow to occur. The same situation occurs when dividing a small number by a

    large number or dividing a large number by a small number. This brings up a fourth rule we

    should attempt to follow when multiplying or dividing values:

    When multiplying and dividing sets of numbers, try to arrange the multiplications

    so that they multiply large and small numbers together; likewise, try to divide numbers that have

    the same relative magnitudes.

    Comparing floating pointer numbers is very dangerous. Given the inaccuracies present in any

    computation (including converting an input string to a floating point value), two floating point

    values should never be compared to see if they are equal. In a binary floating point format,

    different computations which produce the same (mathematical) result may differ in their least

    significant bits. For example, adding 1.31e0+1.69e0 should produce 3.00e0. Likewise, adding

    2.50e0+1.50e0 should produce 3.00e0. However, were you to compare (1.31e0+1.69e0) agains

    (2.50e0+1.50e0) we might find out that these sums are not equal to one another. The test for

    equality succeeds if and only if all bits (or digits) in the two operands are exactly the same. Since

    this is not necessarily true after two different floating point computations which should produce

    the same result, a straight test for equality may not work.

    The standard way to test for equality between floating point numbers is to determine how much

    error (or tolerance) you will allow in a comparison and check to see if one value is within this

    error range of the other. The straight-forward way to do this is to use a

  • 7/28/2019 Project Report Multiplication

    24/32

    test like the following:

    if Value1 >= (Value2-error) and Value1

  • 7/28/2019 Project Report Multiplication

    25/32

    binary floating-point numberxis represented as a significand and an exponent, x = s* 2e.

    The formula

    (s1 *2e1) (s2 *2e2) = (s1 s2) *2e1+e2

    Shows that a floating-point multiply algorithm has several parts. The first part multiplies

    the significands using ordinary integer multiplication. Because floating point numbers are

    stored in sign magnitude form, the multiplier need only deal with unsigned numbers

    (although we have seen that Booth recoding handles signed twos complement numbers

    painlessly). The second part rounds the result. If the significands are unsigned p-bit

    numbers (e.g.,p = 24 for single precision), then the product can have as many as 2p bits and

    must be rounded to ap-bit number. The third part computes the new exponent.

    Because exponents are stored with a bias, this involves subtracting the bias from the sum

    of the biased exponents.

    Example

    How does the multiplication of the single-precision numbers

    1 10000010 000. . . = 1* 23

    0 10000011 000. . . = 1* 24

    Proceed in binary?

    Answer

    When unpacked, the significands are both 1.0, their product is 1.0, and so the

    result is of the form

    1 ???????? 000. . .

    To compute the exponent, use the formula

    Biased exp (e1 + e2) = biased exp(e1) + biased exp(e2) bias

    The bias is 127 = 011111112, so in twos complement 127 is 100000012. Thus, the biased

    exponent of the product is10000010

    10000011

    + 10000001

    10000110

  • 7/28/2019 Project Report Multiplication

    26/32

    Since this is 134 decimal, it represents an exponent of 134 bias = 134 127 = 7, as

    expected.

    The interesting part of floating-point multiplication is rounding. Since the cases are similar

    in all bases, the figure uses human-friendly base 10, rather than base 2.

    For floating point number multiplication its necessary to know about floating point number

    addition. As while performing floating point multiplication we have to perform addition anyhow

    to get the final result. So in performing addition there may be some carry generated, for which

    we have to renormalize it in which it may lose its precision bits. For that we have to take three

    extra bits guard, round and sticky. Hence, its much important to know about how addition

    occurs and then multiplication. The next page contains, how addition is done and what all

    procedures are obtained in order to get the final result.

    Chapter 3

    3.ADDITION ALGORITHM

  • 7/28/2019 Project Report Multiplication

    27/32

    Let a1 and a2 be the two numbers to be added. The notations ei and si are used for the

    exponent and significant of the addends ai. This means that the floating-point inputs

    have been unpacked and that si has an explicit leading bit. To add a1 and a2, perform

    these eight steps:1. If e1 < e2, swap the operands. This ensures that the difference of the exponents

    satisfies d = e1e2 0. Tentatively set the exponent of the result to e1.

    2. If the sign of a1 and a2 differ, replace s2 by its twos complement.

    3. Place s2 in ap-bit register and shift it d = e1-e2 places to the right (shifting in 1s if the

    s2 was complemented in previous step). From the bits shifted out, set g to the most-

    significant bit, rto the next most-significant bit, and set sticky bit s to the OR of the rest.

    4. Compute a preliminary significant S = s1+s2 by adding s1 to the p-bit register

    containing s2. If the signs of a1 and a2 are different, the most-significant bit of S is 1, and

    there was no carry out then S is negative. Replace S with its twos complement. This can

    only happen when d = 0.

    5. Shift S as follows. If the signs of a1 and a2 are same and there was a carry out in step

    4, shift S right by one, filling the high order position with one (the carry out). Otherwise

    shift it left until it is normalized. When left shifting, on the first shift fill in the low order

    position with the g bit. After that, shift in zeros. Adjust the exponent of the result

    accordingly.

    6. Adjust rand s. If S was shifted right in step 5, set r: = low order bit of S before shifting

    and s: = g or ror s. If there was no shift, set r: =g, s: = r. If there was a single left shift,

    dont change rand s. If there were two or more left shifts, set r: = 0, s: = 0. (In the last

    case, two or more shifts can only happen when a1 and a2 have opposite signs and thesame exponent, in which case the computation s1 + s2 in step 4 will be exact.)

    7. Compute the sign of the result. If a1 and a2 have the same sign, this is the sign of the

    result.

  • 7/28/2019 Project Report Multiplication

    28/32

    Ifa1 and a2 have different signs, then the sign of the result depends on which ofa1, a2 is

    negative, whether there was a swap in the step 1 and whether Swas replaced by its twos

    complement in step 4.

    3.1 ABOUT FLOATING POINT ARITHMETIC

    Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication

    and division the operations are done with algorithms similar to those used on sign magnitude

  • 7/28/2019 Project Report Multiplication

    29/32

    integers (because of the similarity of representation) -- example, only add numbers of the same

    sign. If the numbers are of opposite sign, must do subtraction.

    ADDITION

    Example on decimal value given in scientific notation:

    3.25 x 10 ** 3

    + 2.63 x 10 ** -1

    -----------------

    first step: align decimal points

    second step: add

    3.25 x 10 ** 3

    + 0.000263 x 10 ** 3

    --------------------

    3.250263 x 10 ** 3

    (presumes use of infinite precision, without regard for accuracy)

    third step: normalize the result (already normalized!)

    example on fl pt. value given in binary:

    .25 = 0 01111101 00000000000000000000000

    100 = 0 10000101 10010000000000000000000

    to add these fl. pt. representations,

    step 1: align radix points

    shifting the mantissa LEFT by 1 bit DECREASES THE EXPONENT by 1

    shifting the mantissa RIGHT by 1 bit INCREASES THE EXPONENT by 1

    we want to shift the mantissa right, because the bits that fall off the end should come from theleast significant end of the mantissa

    > we choose to shift the .25, since we want to increase it's exponent.-> shift by 10000101

    -01111101

    ---------

  • 7/28/2019 Project Report Multiplication

    30/32

    00001000 (8) places.

    0 01111101 00000000000000000000000 (original value)

    0 01111110 10000000000000000000000 (shifted 1 place

    (note that hidden bit is shifted into msb of mantissa)

    0 01111111 01000000000000000000000 (shifted 2 places)

    0 10000000 00100000000000000000000 (shifted 3 places)

    0 10000001 00010000000000000000000 (shifted 4 places)

    0 10000010 00001000000000000000000 (shifted 5 places)

    0 10000011 00000100000000000000000 (shifted 6 places)

    0 10000100 00000010000000000000000 (shifted 7 places)

    0 10000101 00000001000000000000000 (shifted 8 places)

    step 2: add ( hidden bit for the 100 shouldnt be forgotten)

    0 10000101 1.10010000000000000000000 (100)

    + 0 10000101 0.00000001000000000000000 (.25)

    ---------------------------------------

    0 10000101 1.10010001000000000000000

    step 3: normalize the result (get the "hidden bit" to be a 1)

    it already is for this example.

    result is: 0 10000101 10010001000000000000000

    conclusion

  • 7/28/2019 Project Report Multiplication

    31/32

    For floating point number multiplication its necessary to know about floating

    point number addition. As while performing floating point multiplication we have

    to perform addition anyhow to get the final result. So in performing addition

    there may be some carry generated, for which we have to renormalize it in which

    it may lose its precision bits. For that we have to take three extra bits guard,

    round and sticky. Its much important to know about how addition occurs and

    then multiplication. This presentation contained how addition is done and what

    all procedures are obtained in order to get the final result. Now we have studied

    and gathered idea about floating point addition which will be helpful for us while

    doing the multiplication part in our next semester as our major project.

  • 7/28/2019 Project Report Multiplication

    32/32

    references

    Liang-Kai Wang and Michael J. Schulte Decimal Floating-Point Adder andMultifunction Unit with Injection-Based Rounding

    Department of Electrical and Computer Engineering18th IEEE Symposium on Computer

    Arithmetic(ARITH'07)

    G. Even and P. M. Seidel. A comparison of three roundingalgorithms for IEEE floating-point multiplication. IEEE Transactions on Computers,

    49(7), July 2000

    .

    N. Burgess. Renormalizations rounding in IEEE floating pointOperations using a flagged prefix adder. IEEE Transactions on VLSI System,

    13(2):266277, Feb 2005 .

    IEEE Standard for Floating-Point Arithmetic IEEE 3 Park Avenue New York, NY 10016-5997, USA 29 August 2008 IEEE Computer Society Sponsored by the Microprocessor

    Standards Committee.

    M. S. Schmookler and A. W. Weinberger. High speed decimal additionIEEE Transactions on Computers, C-20:862867, Aug 1971.