DSP Arithmetic

Post on 13-Apr-2015

43 views 0 download

description

fixed point and floating point representations

Transcript of DSP Arithmetic

DSP Arithmetic

1

Contents

• Fixed point representation• Floating point representation• Some math operations:• Addition• Subtraction• Multiplication • Division

• Comparison

2

Introduction • Practical DSP implementation should take into

consideration:• Possible quantization errors• Arithmetic errors• Possible Overflow

• A DSP processor’s data format determines its ability to handle signals of different precisions, dynamic ranges, and SQNRs.

• In order to write efficient programs for DSP applications, we must understand how the processor manipulates data.

3

FIXED POINT NOTATION

4

Some Fixed-point processors:• TMS320C64xx processors

• ADSP2101 processor

5

Fixed point notation• Fixed point DSPs usually represent each number

with a minimum of 16 bits, although a different length can be used.

• There are four common ways that these 216( 65,536) possible bit patterns can represent a number.

• Unsigned integer• Signed integer• Unsigned fraction• Signed fraction 6

Fixed Point Representation• In unsigned integer, the stored number can take

on any integer value from 0 to 65,535.

• Example: • Consider 4-bit representation. We can represent

numbers in the range 0 to 15.

• 410=01002

• 810=10002

• However, if the result of an arithmetic representation exceeds 1510, overflow occurs.

Fixed Point Representation• Similarly, signed integer uses two's

complement to make the range include negative numbers, from -32,768 to 32,767.

• Example:• Consider 4 bit representation. We can

represent numbers in the range -8 to 7.

• 510=0101 -510=1011

• Uses two’s complement to represent signed numbers. The first bit is used as signed bit.

Fractional fixed point notation• Used for representing numbers with both integer

and fractional parts.• The Qm.n convention uses m bits to represent the

integer portion of the number and n bits to represent the fractional portion.

• Total no. of bits: N=m+n+1

1 Sign bit

m integer bits

n fractional bits

Radix pt.

9

Q15 format• For example, a 16-bit number that uses 1 sign bit

and 15 bits for the fractional part is called Q0.15 format or simply the Q15 format.

• Q.15 format is commonly used in DSP systems and data must be properly scaled so that their value lies between -1 and 0.999969482421875.

10

Fractional Fixed point notation• With unsigned

fraction notation, the 65,536 levels are spread uniformly between 0 and 1.

• Example:• Consider 3-bit

unsigned fraction representation

Number Decimal fraction

Fractional

notation

0 0 000

1 1/8 001

2 2/8 010

3 3/8 011

4 4/8 100

5 5/8 101

6 6/8 110

7 7/8 111

Fractional Fixed point notation• Lastly, the signed

fraction format allows negative numbers, equally spaced between -1 and 1.

• Example:• Consider 3-bit

signed fraction representation.

Number Decimal fraction

Fractional

notation

3 3/4 011

2 2/4 010

1 1/4 001

0 0 000

-1 -1/4 111

-2 -2/4 110

-3 -3/4 101

-4 -1 100

Example • Represent the decimal number, 0.95624 as• A Q3 number, and• A Q4 number

• Q3 number:

• A Q3 is a 2’s complement number with one sign bit and 3 fractional bits.

• This no. can be rounded to 7=0111. 13

Example (contd.)• Q4 number:

• A Q4 is a 2’s complement number with one sign bit and 4 fractional bits.

• This no. can be rounded to 15=01111.

14

Example (contd.)• Errors in representation:

• Case-1: Q3 notation

• Case-2: Q4 notation

• The error in representing the number is often referred to as coefficient quantization error.

15

Implementation • Most fixed-point DSP processors use two’s

complement fractional numbers in different Q formats.

• However, assemblers only recognize integer values.

• The programmer must keep track of the position of the binary point when manipulating fractional numbers in assembly programs.

• The following steps convert a fractional number in Q format into an integer value that can be recognized by the assembler. Let us see this with an example:

16

Implementation (contd.)• Assume that the coefficient used by the assembler is 1.18.

the DSP processor uses Q15 format.

• Step 1: normalize the fractional number to the range determined by the desired Q format.

• For Q15 format the range is [-1,1). Normalize the number to this range. Thus,

• Step 2: Multiply the normalized fractional number by 2n, where n is the no. of fractional bits.

• Multiply 0.59 by 215. thus,

• Step 3: round the product to the nearest integer.• Round the decimal value 19,333.12 to obtain

17

Implementation (contd.)• The arithmetic result obtained by a DSP

processor is in the integer form. It can be interpreted as a fractional value by dividing by 2n.

• This is equivalent to shifting the binary point n bits to the left.

• In DSP implementation, it is not always necessary to use Q.15 format throughout the DSP algorithm; instead, we can use different Q formats for different dynamic range requirements.

18

Binary addition-Example• Addition of two 4-bit numbers represented in Q3

format:

No overflow

Overflow

• Thus, addition of two numbers in fractional representation can result in overflow. 19

Binary multiplication-Example• When multiplying two 4-bit numbers in Q.3 format

requires a 7-bit word in Q.6 format to store the product. and there is no overflow.

We want to store the result in a 4-bit word and hence, truncate the result to the four most significant bits(0.101)

Then, the error is 0.65625-0.625=0.03125

• Multiplication in Q format does not result in overflow except in the case of (which is not in the range)

20

Binary division• Hardware implementation of division is

expensive.

• Therefore, most processors do not provide a single-cycle divide instruction supported by the hardware.

• For an N-bit fractional number, fractional division can be realized by repeating the conditional subtraction instruction (N-1) times.

21

FLOATING POINT

ARITHMETIC

22

Floating Point Processors• TMS320C3x• TMS32067x• ADSP2106x

• Floating point formats allow numbers to be represented with a large dynamic range.

• Thus, floating point arithmetic can reduce the problem of overflow that occurs in fixed point arithmetic.

23

Floating point formats• A binary floating point number X is represented

as the product of two signed numbers, the mantissa M and the exponent E.

• The exponent determines the range of numbers that can be represented, the mantissa the accuracy of the numbers.

• For example, if mantissa—16 bits, exponent—8 bits:

• Range of numbers that can be represented:24

IEEE floating point format• IEEE 754 Standard:

• The decimal equivalent, X, of a normalized IEEE floating point number is given by,

• Where ,• F is the mantissa in 2’s complement binary fraction• E is the exponent in excess 127 form• s=0 for positive no.s, s=1 for negative no.s

25

s Exponent (8 bit) Mantissa (23 bit)

022

23

31

32

Fig: Floating point representation(IEEE single precision)

Floating point addition• In order to perform floating point addition, we

have to adjust the exponent of the smaller number to match that of the bigger number.

• Consider and

Y

26

Example • We are given two floating point numbers

• Here,

• So, in the result :• s=1 mantissa=0.215 exp=3+127=130

27

Floating point multiplication-Example

• So, in the result :• s=1 mantissa=0.8544exp=4+127=131

• The mantissas of the two numbers are multiplied, while the exponent terms are added without the need to align them.

28

• Most floating point processors perform automatic normalization so that numbers are properly shifted and aligned. The programmer just needs to take care of the overflow problem.

• However, due to large dynamic range scaling is rarely necessary.

• Hence, floating point processors are easier to use than fixed point processors.

29

COMPARISONBetween Fixed Point and Floating point notations

30

Comparison

Fixed point• 16- or 24- bit devices

• Limited dynamic range

• Overflow and quantization errors must be resolved.

• Poorer C compiler efficiency; normally programmed in assembly.

Floating point• 32-bit devices

• Large dynamic range

• Easier to program as no scaling is required.

• Better C compiler efficiency; can be developed in C.

31

Comparison

Fixed point• Faster clock rate

• Functional units are simpler, less silicon area required.

• Cheaper

• Lower power consumption

Floating point• Slower clock rate

• Functional units are complex, more silicon area required.

• More expensive

• Higher power consumption 32

References • Sen M Kuo, Woon-seng S. Gan, Digital Signal

Processors-Architectures, Implementations and Applications

• Emmanuel Ifeachor, Barrie W. Jervis, Digital Signal Processing

• Steven M. Smith, The Scientist And Engineer’s Guide To Digital Signal Processing

33