456

IMPLEMENTATION OF 16-BIT FLOATING POINT MULTIPLIER USING RESIDUE NUMBER

SYSTEM

Naamatheertham R Samhitha, Neethu Acha Cherian, Pretty Mariam Jacob, P Jayakrishnan

Abstract This paper aims at the implementation of 16 bit

floating point multiplier using Residue Number system. Residue Number System (RNS), which is a non-weighted number system gains popularity in the implementation of fast and parallel computing applications. It has inherent properties such as modularity, parallelism and carry free computation, which speeds up the arithmetic computations. Floating Point can be represented as EBM where M is Mantissa, E is the Exponent and B is the base. Floating point RNS multiplier consists of RNS Exponent modulo Adder and RNS Mantissa modulo multiplier. In this paper, floating point multiplier will be implemented using Verilog HDL using ModelSim and synthesized using Altera Cyclone II Quartus and frequency of multiplier is found to be 311MHz.

Index Terms Multipliers, Residue Number System (RNS), Floating point, Modulus

INTRODUCTION Multiplier is one of the important blocks in digital

applications such as Digital Signal Processing, filtering and microprocessors. There are different designs of multipliers which can be used for optimizing the performance like high speed, low power and less area. There are two types of arithmetic operations. They are fixed and floating point operations. Fixed point number was inefficient for big number arithmetic. So floating point arithmetic was invented. Real numbers can be represented as a floating point number with two parts, mantissa and exponent. A floating point number is represented as EBM where M is mantissa, B is base and E is exponent. The floating point representation used here is a half-precision. Size of mantissa indicates precision and size of exponent indicates the range. In recent times Residue Number System is becoming popular because of its carry free addition and multiplication capabilities. Since there is no carry propagation between arithmetic blocks, high speed processing can be obtained. RNS representation encodes large numbers into small residues so that computation can be performed more efficiently. The arithmetic can be implemented parallely for these residues. This ensures that there is no dependency between each modulo unit. So, the complexity of the arithmetic units in each modulo unit is reduced. Here we need some extra hardware for converting binary to residue and residue to binary.

The RNS is defined by set of co-primes called moduli. The moduli set is denoted by }...,,{ 321 immmm , where im is the ith modulus. Each integer X can be represented as set of small integers called residues. This residues are denoted by

}...,,{ 321 irrrr , where ir is the ith residue. Integer X is divided

by each moduli and corresponding remainder obtained is taken as residue. Mathematically representation is given by Equation 1.

ii mXr mod= (1) The entire paper is divided into six sections. In first section

introduction about multiplier, floating point and RNS is discussed. In second section the overall architecture of the floating point RNS multiplier is used. Third section discusses about Floating point multiplier algorithm. Fourth section deals with the converters used i.e. Binary to RNS Converter and RNS to Binary Converter. Fifth section explains about results and conclusions that are obtained.

ARCHITECTURE OF FLOATING POINT RNS MULTIPLIER

The modules that are required for implementing a Floating Point RNS Multiplier are Binary to RNS Converter, Modulo Adder, Modulo Multiplier, and RNS to Binary Converter. Floating point multiplication involves multiplication of mantissa and addition of exponent. So, Floating Point in residue domain includes a RNS modulo multiplier for mantissa and RNS modulo adder for exponent. The block diagram of Floating point RNS multiplier is shown in Fig. 2. Basically floating point inputs are given as Mantissa and Exponent. The floating point representation incorporates a half-precision notation as shown in Fig. 1 Sign bit of 1-bit, Mantissa is of 10-bit and Exponent is of 5-bit.

Fig. 1. Floating Point Representation

The flow of operations is as follows:

1. The unbiased Exponent is converted to biased by adding 15 (this value depends on number of bits

Sign (1-bit) Exponent (5-bit) Mantissa

195978-1-4673-6126-2/13/$31.00 c2013 IEEE

used to represent exponent). This biasing is done to ensure that the Exponent is unsigned.

2. The Mantissa and biased Exponent is converted to Residue Number System. In RNS, based on the moduli, residues are obtained.

3. For multiplication, the Mantissa should be multiplied and Exponent should be added. For this, an RNS Mantissa modulo multiplier and RNS Exponent modulo adder are used.

Fig. 2. Block Diagram of Floating Point RNS Multiplier

For binary to RNS conversion the moduli set we have chosen is 12{ n , n2 , }12 +n , where n is decided based on the number of bits of the input binary number. By making use of these moduli a particular binary number can be converted into corresponding residues. This set of moduli makes the forward conversion process fast and simple. In general, forward converters based on these moduli set are the most efficient converters.

FLOATING POINT MULTIPLIER ALGORITHM There are different methods for multiplication in residue

domain such as look up tables, array of adders and combination of look up tables and adders [1], [2]. For large number of bits the delay and area of residue array multiplier is preferred [1]. Here for residue array multiplier Brickells algorithm is used.

The algorithm used is as follows: 1. Algorithm for RNS Modulo multiplier Initialize

accumulator to 0. For 0=i to 1N , where N is no of bits of multiplier.

2. Double the contents of accumulator and add partial products to it. Partial products are obtained by AND operation of Bi and A where A is multiplicand and Bi is ith bit of multiplier.

3. Check the contents of accumulator (acc), if

acc > im , then acc = acc - im acc > 2 im , then acc = acc - 2 im

Where im is the modulus.

The addition in residue domain is done by using modulo adder shown in Fig. 3. The algorithm of RNS modulo adder is as follows. Let iii bac += mod im , where ia and ib are the residues and im is the modulus, then

ic = ia + ib , if ia + ib < im

ic = ia + ib - im , if ia + ib >= im

Fig. 3. Modulo M adder

In order to perform modulo m addition [3] as shown in Fig. 3, the adder structure which we have used here is a Parallel Prefix adder called Kogge-stone adder.

BINARY TO RNS & RNS TO BINARY CONVERTER

Binary to RNS Converter The data available will be in the form of binary. In order to

process in RNS, the binary data has to be converted into RNS. The process of converting binary data into RNS is referred to as the forward conversion. The forward converter must be area, power and speed efficient. After the data is being processed through the modulo processing units of RNS, they must be converted back to their conventional representations. The process of converting back into the conventional representations is referred to as backward conversion. The forward conversion can be for an arbitrary moduli set or special moduli set. The special moduli set used in this paper include

}12,2,12{ + nnn [4], [5], where n is decided based on the number of bits of the input binary number. Since the exponent is of 5-bits, the moduli set required for the exponent is {3, 4, 5} which enables to represent 0 to 25-1 in the RNS form. Similarly, as the mantissa is of 10bits, the moduli set considered is {255, 256, 257} which gives the RNS representation for numbers from 0 to 210-1. These special moduli set are considered for the RNS operation because they make the system fast and simple along with being efficient. In order to obtain the residue of an input binary number with

196 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE)

respect to a certain modulus, the distributive property of the addition in RNS is taken into consideration i.e.,

mmmm YXYX |||||||| +=+ (2)

Consider a binary number X given by X = xn-1xn-2....x1x0 which can be given as

=

=

1

02

n

jj

j xX

The modulus of the binary number using equation 2 gives,

mX || =

=

1

0|2|

n

jmj

j x =

=

1

0||2||

n

jmmj

j x

Where jx can be either 0 or 1. This paper includes a parallel method of forward

conversion of the input binary block. The input binary blocks are being divided into blocks each of ''n bits. Thus, if a given binary number X is divided into blocks, 011 ,... BBBk , then we have,

=

=

1

02

n

jj

jB BX

Hence, we can have

mX || =

=

=

1

0|2|

n

jmj

jB B

=

1

0||2||

n

jmmj

jB B

Due to the usage of a special moduli set }12,2,12{ + nnn ,

the number to be converted to RNS is divided into three blocks each of n bits. The three blocks 321 ,, BBB are represented as given below.

=

=

13

2

21 2

n

njj

nj xB

=

=

12

2 2n

njj

nj xB

=

=

1

03 2

n

jj

j xB

Thus, the input binary number, X can be given in terms of the blocks as,

322

1 22 BBBXnn ++=

Then the residues can be obtained as follows,

=1r 12321 || ++ nBBB

=2r 2B =3r 12321 || ++ nBBB

The overall architecture of Binary to RNS converter architecture is given in Fig. 4. In this for calculating residue 1r ,

blocks 1B and 2B are first added using modulo 12 n adder

and then the result is added with the block 3B using same

modulo 12 n adder. Whereas for calculating residue 3r ,

block 1B is added with the 2s compliment of block 2B using modulo 12 +n adder and then the result is added with the block 3B using same modulo 12 +

n adder.

Fig. 4. Binary to RNS Converter for moduli set }12,2,12{ + nnn A typical architecture for the implementation of a forward

converter from binary to RNS representation is given in Fig. 4. This indicates that the residue 2r , is just the n significant bits

of the input binary number X. 1r and 3r are obtained with the help of a 12 n and 12 +n modulo adders. The modulo-m adder forms the basic arithmetic unit for any RNS operation or RNS conversion. The basic modulo-m adder architecture is given in Fig. 3. The adder structure can be any conventional adder that can be used like a ripple carry adder (RCA), a carry look ahead adder or any parallel prefix tree. In this structure, a kogge stone adder has been used. Since we are providing mantissa and exponent binary inputs separately, they should be converted separately using the moduli mentioned for them respectively. When a 5-bit binary exponent is converted into RNS using moduli set {3,4,5} then the residues each of 3-bit is obtained. So we are considering a 3-bit Kogge-Stone adder as in Fig. 5. Similarly when a 10-bit binary exponent is converted into RNS then the residues each of 9-bit is obtained. So for this we are considering 9-bit Kogge-Stone adder as in Fig. 6.

Fig. 5. 3-bit Kogge Stone adder

2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE) 197

Fig. 6. 9-bit Kogge Stone adder

RNS to Binary Converter RNS to binary conversion is the bottle neck in using RNS.

RNS to Binary conversion is based on two algorithms [1659138]. They are mixed radix conversion (MRC) and Chinese remainder theorem (CRT). MRC is sequential whereas CRT is parallel. The design of modulo M adder in the final stage of CRT is design bottleneck of it. In this paper CRT algorithm is used for conversion of RNS to binary. The mathematical equations for CRT are as follows. Given a set of moduli }...,,{ 321 immmm and the residues are }...,,{ 321 irrrr , then binary number X is given as

=mX || =

n

iMimii MMr i

1

1 ||||

Where M is the product of modulus m and iM . So CRT requires three main processes.

1) Calculating iM and inverses imiM ||1

2) Multiplying iM and inverses with residues and accumulating it.

3) Modulo M adder as the final stage.

RESULTS The simulation results of 16 bit floating point RNS

multiplier are shown in Fig. 7.The implementation of 16 bit floating point RNS multiplier using Altera Cyclone II FPGA requires 150 logic elements (LE). The maximum frequency required by 16 bit floating point RNS multiplier is 311MHz. The synthesis results on Altera Cyclone II Quartus are tabulated in Table I.

SYNTHESIS RESULTS OF 16 BIT FLOATING POINT MULTIPLIER

Parameters Value

Logic Elements Used 150

Frequency 311MHz

CONCLUSION The 16 bit floating point multiplier unit is designed in

verilog and implemented in Altera Cyclone II FPGA. By using Residue Number System parallel and carry free arithmetic can be obtained. The large number is represented in the form of residues. So addition and multiplication is performed parallely on all the residues. Using residues for multiplication avoids the partial products generation and thus makes it less complicated and more efficient. Thus arithmetic is performed in a faster rate. This 16 floating point multiplier is found to be of high speed and high performance.

REFERENCES [1] C.-L. Chiang and L. Johnson, Residue arithmetic and VLSI, IEEE International Conference on Computer Design: VLSI in computers, 1983. [2] J.-S. Chiang and M. Lu, Floating-point numbers in residue number systems, Computers and Mathematics with Applications, vol. 22, no. 10, pp. 127 140, 1991. [3] M. Dugdale, VLSI implementation of residue adders based on binary adders, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, vol. 39, no. 5, pp. 325329, 1992. [4] J. claude Bajard and T. Plantard, RNS bases and conversions. [5] F. Taylor, Residue arithmetic a tutorial with examples, Computer, vol. 17, no. 5, pp. 5062, 1984. [6] A. Hiasat, High-speed and reduced-area modular adder structures for RNS, Computers, IEEE Transactions on, vol. 51, no. 1, pp. 8489, 2002. [7] A. Ghosh, S. Singha, and A. Sinha, Floating point RNS: a new concept for designing the mac unit of digital signal processor, SIGARCH Comput. Archit. News, vol. 40, no. 2, pp. 3943, May 2012. [8] E. Kinoshita, H. Kosako, and Y. Kojima, Floating-point arithmetic algorithms in the symmetric residue number system, Computers, IEEE Transactions on, vol. C-23, no. 1, pp. 920, 1974.

Fig. 7. Simulation results for floating point RNS Multiplier

198 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE)

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

/CreateJDFFile false /Description >>> setdistillerparams> setpagedevice

456

Documents

Transcript of 456