Module 2 - comp.utm.my · Module 2: Computer Arithmetic . ... Control test Multiplier S hif trg...

89
BOOK: COMPUTER ORGANIZATION AND DESIGN,3ED, DAVID L. PATTERSON AND JOHN L. HANNESSY, MORGAN KAUFMANN PUBLISHERS 1 Module 2: Computer Arithmetic

Transcript of Module 2 - comp.utm.my · Module 2: Computer Arithmetic . ... Control test Multiplier S hif trg...

B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N , 3 E D , D A V I D L . P A T T E R S O N A N D

J O H N L . H A N N E S S Y , M O R G A N K A U F M A N N P U B L I S H E R S

1

Module 2: Computer Arithmetic

Arithmetic Operations 2

Complement

Addition

Subtraction

Multiplication

Division

Introduction 3

Numbers are represented by binary digits (bits):

How are negative numbers represented?

What is the largest number that can be represented in a computer world?

What happens if an operation creates a number bigger than can be represented?

What about fractions and real numbers?

A mystery: How does hardware really multiply or divide numbers?

Binary Numbers 4

Binary Numbers 5

Binary Number System

System Digits: 0 and 1

Bit (short for binary digit): A single binary digit

LSB (least significant bit): The rightmost bit

MSB (most significant bit): The leftmost bit

Upper Byte (or nybble): The right-hand byte (or nybble) of a pair

Lower Byte (or nybble): The left-hand byte (or nybble) of a pair

Binary Numbers 6

Binary Equivalents

1 Nybble (or nibble) = 4 bits

1 Byte = 2 nybbles = 8 bits

1 Kilobyte (KB) = 1024 bytes

1 Megabyte (MB) = 1024 kilobytes = 1,048,576 bytes

1 Gigabyte (GB) = 1024 megabytes = 1,073,741,824 bytes

Binary Numbers 7

Base 2

Binary Addition 8

Rules of Binary Addition

0 + 0 = 0

0 + 1 = 1

1 + 0 = 1

1 + 1 = 0, and carry 1 to the next more significant bit

Example: 26 +12 Carry

Binary Subtraction 9

Rules of Binary Subtraction

0 - 0 = 0

0 - 1 = 1, and borrow 1 from the next more significant bit

1 - 0 = 1

1 - 1 = 0

Example: 37 -17

borrowed

Binary Multiplication 10

Rules of Binary Multiplication

0 x 0 = 0

0 x 1 = 0

1 x 0 = 0

1 x 1 = 1,

and no carry or borrow bits

Example: 23 x 3

Another Method: Binary multiplication is the same as repeated binary addition

Binary division 11

2’s Complement 12

Two's complement representation allows the use of binary

arithmetic operations on signed integers, yielding the correct 2's

complement results.

Positive Numbers: Positive 2's complement numbers are

represented as the simple binary.

Negative Numbers: Negative 2's complement numbers are

represented as the binary number that when added to a positive

number of the same magnitude, will equals zero.

13

To represent positive and negative numbers look at the MSB (or the sign bit) •MSB = 0 means positive •MSB = 1 means negative

Calculation of 2's Complement 14

Step 1:

invert the binary equivalent of the number by changing all of the ones to zeroes and all of the zeroes to ones (also called 1's complement)

Step 2:

Then add one.

Example: -17

17 = 00010001

Step1 : 11101110

Step2 : 11101110 + 1 ----------------------- 11101111 -17 = 11101111

2's Complement Addition 15

Two's complement addition follows the same rules as binary addition.

Example: 5 + (-3) 5 = 00000101

00000101 + 11111101 ----------------------- 1 00000010 (2)

-3 = 11111101

Ignore

2's Complement Addition 16

Two's complement addition follows the same rules as binary addition.

Example: 3 + (-5) 3 = 00000011

00000011 + 11111011 ----------------------- 11111110 (-2)

-5 = 11111011

2's Complement Subtraction 17

Two's complement subtraction is the binary addition of the minuhend to the 2's complement of the subtrahend (adding a negative number is the same as subtracting a positive one).

Example : 7 – 12 7 + (-12)

minuhend subtrahend

2's Complement Subtraction 18

Example : 7 – 12 7 + (-12)

7 = 00000111

00000111 + 11110100 ----------------------- 11111011 (-5)

-12 = 11110100

-5 = 11111011

Step1 : 00000100

Step2 : 00000100 + 1 ----------------------- 00000101 (5)

CHECK!

2's Complement Multiplication 19

Two's complement multiplication follows the same rules as binary multiplication.

Example : (-4) × 4 = (-16) 4 = 00000100

11111100 x 00000100 ----------------------- 11 11110000 (-16)

-4 = 11111100

-16 = 11110000

Step1 : 00001111

Step2 : 00001111 + 1 ----------------------- 00010000 (16)

CHECK!

Ignore

2's Complement Division 20

Two's complement division is repeated 2's complement subtraction.

The 2's complement of the divisor is calculated, then added to the dividend.

For the next subtraction cycle, the quotient replaces the dividend.

This repeats until the quotient is too small for subtraction or is zero, then it becomes the remainder.

The final answer is the total of subtraction cycles plus the remainder

Example : 6/ 3 = 2

dividend

divisor

quotient

2's Complement Division 21

Example: 6/3 Example: 7/3

6 + (-3) = 3

3 + (-3) = 0

Cycle 1

Cycle 2

6 / 3 = 2

The number of cycle

7 + (-3) = 4

4 + (-3) = 1

Cycle 1

Cycle 2

7 / 3 = 2 remainder 1

2's Complement Division 22

Example: 42/6

42 + (-6) = 36

36 + (-6) = 30

Cycle 1

Cycle 2

42 / 6 = 7

30 + (-6) = 24

24 + (-6) = 18

Cycle 3

Cycle 4

18 + (-6) = 12 Cycle 5

12 + (-6) = 6

6 + (-6) = 0

Cycle 6

Cycle 7

Sign Extension 23

Extending a number representation to a larger number of bits.

Example: 2 in 8 bit binary to 16 bit binary.

In signed numbers, it is important to extend the sign bit to preserve the number (+ve or –ve)

Example: -2 in 8 bit binary to 16 bit binary.

00000010 00000000 00000010

11111110 11111111 11111110

Sign bit Sign bit Sign bit extended

Detecting Overflow in Two Complement Numbers

24

Overflow occurs when adding two positive numbers and the sum is negative, or vice versa

A carry out occurred into the sign bit

Overflow conditions for addition and subtraction

Overflow Rule for addition 25

If 2 Two's Complement numbers are added, and they both have the same sign (both positive or both negative), then overflow occurs if and only if the result has the opposite sign. Adding two positive numbers must give a positive result

Adding two negative numbers must give a negative result

Overflow never occurs when adding operands with different signs.

Overflow occurs if

(+A) + (+B) = −C

(−A) + (−B) = +C

Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)

(-7) + (-6) = (-13) but Overflow (largest −ve number is −8)

-7 = 1001

1001 + 1010 ------------ 1 0011 (3)

-6 = 1010

Overflow occurs

The sign bit has changed to +ve

Overflow Rule for Subtraction 26

If 2 Two's Complement numbers are subtracted, and their signs are different, then overflow occurs if and only if the result has the same sign as the subtrahend.

Overflow occurs if (+A) − (−B) = −C

(−A) − (+B) = +C

Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)

Subtract −6 from +7 (i.e. 7 – (-6))

result has the same sign as the subtrahend overflow happens

subtrahend

result

Overflow Rule for Subtraction 27

If 2 Two's Complement numbers are subtracted, and their signs are different, then overflow occurs if and only if the result has the same sign as the subtrahend.

Overflow occurs if (+A) − (−B) = −C

(−A) − (+B) = +C

Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)

Subtract −6 from +7 (i.e. 7 – (-6))

7 = 0111

0111 + 0110 ------------ 1101 (-3)

6 = 0110 Overflow

occurs

Result has same sign as

subtrahend

A little summary 28

Addition

Add the values, discarding any carry-out bit

Subtraction

Negate the subtrahend and add, discarding any carry-out bit

Overflow

Occurs when adding two positive numbers produces a negative result, or when adding two negative numbers produces a positive result. Adding operands of unlike signs never produces an overflow

Notice that discarding the carry out of the most significant bit during Two's Complement addition is a normal occurrence, and does not by itself indicate overflow

As an example of overflow, consider adding (80 + 80 = 160)10, which produces a result of −9610 in 8-bit two's complement.

Not 160 because the sign bit is 1. (largest +ve number in 8 bits is 127)

01010000 + 01010000 ------------------ 1010000 (-96)

Binary Multiplication

29

Two's Complement Multiplication 30

There are a couple of ways of doing two's complement multiplication by hand.

Remember that the result can require 2 times as many bits as the original operands.

“Sign Extend Method" for Two's Complement Multiplication

31

In 2's complement, sign extend both integers to twice as many bits.

Then take the correct number of result bits from the least significant portion of the result

A 4-bit example: 2 x (-3)

2= 0010

00000010 x 11111101 --------------------------- 00000010 00000000 00000010 00000010 00000010 00000010 00000010 00000010 --------------------------- 000000111111010

-3 = 1111 1101

2= 0000 0010

Sign extend to 8 bit

Correct answer underlined -6

" Partial Product Sign Extend Method " for Two's Complement Multiplication

32

Another way is to sign extend the partial products to the correct number of bits.

Sometimes we do have to make some adjustments.

Example 1: (-4) x 3

-4 = 1100

1100 x 0011 --------------------------- 11111100 1111100 000000 --------------------------- 11110100 (-12)

3= 0011

Sign extend to the correct number of bits 8

" Partial Product Sign Extend Method " for Two's Complement Multiplication

33

Sometimes we do have to make some adjustments.

If (+ve) x (+ve) then OK

If (-ve) x (+ve) then Sign extend partial products

If (+ve) x (-ve) then get additive inverse of both And then Sign extend partial products Example: 3 x (-4)

If (-ve) x (-ve) then get additive inverse of both

-3=1101; 4 =0100

1101 x 0100 --------------------------- 00000000 0000000 111101 --------------------------- 11110100 (-12)

3=0011 ;-4 = 1100

Normal stuff Like the slide before (-4)x3

" Partial Product Sign Extend Method " for Two's Complement Multiplication

34

Sometimes we do have to make some adjustments.

If (+ve) x (+ve) then OK

If (-ve) x (+ve) then Sign extend partial products

If (+ve) x (-ve) then get additive inverse of both And then Sign extend partial products Example: 3 x (-4)

If (-ve) x (-ve) then get additive inverse of both Example: (-3)x(-3)

3 = 0011

0011 x 0011 --------------------------- 0011 0011 --------------------------- 01001 (9)

-3=1101

Signed Multiplication 35

Another way to deal with signed numbers.

First convert the multiplier and multiplicand to positive numbers and then remember the original signs

Leave the sign out of the calculation

To negate the product only if the original signs disagree

1st Version of Multiplication Hardware

Flows of 1st Version Multiplication

64-bit ALU

Control test

Multiplier

Shift right

Product

Write

Multiplicand

Shift left

64 bits

64 bits

32 bits

Done

1. Test

Multiplier0

1a. Add multiplicand to product and

place the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Multiplicand register, product register, ALU are 64-bit wide; multiplier register is 32-bit wide

Algorithm

32-bit multiplicand starts at right half of multiplicand register

Product register is initialized at 0

Actual implementations are far more complex, and use algorithms that generate more than one bit of product each clock cycle.

Example of Multiplication – 4 bits 37

Example : 2 x 3 = ?

2 x 3 0010 x 0011

Steps:

1a – test multiplier (0 or 1) If 1 then P = P + MC

If 0 then no operation

2 – shift MC left

3 – shift MP right

All bits done? If still <max bit, repeat

If = max bit, stop

Multiplier (MP)

Multiplicand (MC)

Product (P)

Done

1. Test

Multiplier0

1a. Add multiplicand to product and

place the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

0011

P = P + MC = 00000000 + 00000010 = 00000010

MC = 00000010 00000100

MP = 0011 0001

Max bit? NO repeat

0001

38

Iteration Step Multiplier

(MP) Multiplicand

(MC) Product (P)

0 Initial value 0011 0000 0010 0000 0000

1

1a:1P = P + MC

2: Shift MC left

3: Shift MP right

2

1a:1P = P + MC

2: Shift MC left

3: Shift MP right

3

1a:0no operation

2: Shift MC left

3: Shift MP right

4

1a:0no operation

2: Shift MC left

3: Shift MP right

0000 0010

0000 0100

0001

0000 0110

0000 1000

0000

0001 0000

0000

0010 0000

0000

Try with 5 x 4

40

Iteration Step Multiplier

(MP) Multiplicand

(MC) Product (P)

0 Initial value 0100 0000 0101 0000 0000

1

1a:0no operation

2: Shift MC left 0000 1010

3: Shift MP right 0010

2

1a:0no operation

2: Shift MC left 0001 0100

3: Shift MP right 0001

3

1a:1P = P + MC 0001 0100

2: Shift MC left 0010 1000

3: Shift MP right 0000

4

1a:0no operation

2: Shift MC left 0101 000

3: Shift MP right 0000

Example : 5 x 4

Try with 2 x (-3)

20

41

Iteration Step Multiplier

(MP) Multiplicand

(MC) Product (P)

0 Initial value 0011 1111 1110 0000 0000

1

1a:1P = P + MC 1111 1110

2: Shift MC left 1111 1100

3: Shift MP right 0001

2

1a:1P = P + MC 1111 1010

2: Shift MC left 1111 1000

3: Shift MP right 0000

3

1a:0no operation

2: Shift MC left 1111 0000

3: Shift MP right 0000

4

1a:0no operation

2: Shift MC left 1110 0000

3: Shift MP right 0000

Example : 2 x (-3) get additive inverse of both

-6

Binary Division 42

1st Version of Division

Hardware 43

64-bit ALU

Control

test

Quotient

Shift left

Remainder

Write

Divisor

Shift right

64 bits

64 bits

32 bits

Done

Test Remainder

2a. Shift the Quotient register to the left,

setting the new rightmost bit to 1

3. Shift the Divisor register right 1 bit

33rd repetition?

Start

Remainder < 0

No: < 33 repetitions

Yes: 33 repetitions

2b. Restore the original value by adding

the Divisor register to the Remainder

register and place the sum in the

Remainder register. Also shift the

Quotient register to the left, setting the

new least significant bit to 0

1. Subtract the Divisor register from the

Remainder register and place the

result in the Remainder register

Remainder > 0

Divisor register, remainder register, ALU are 64-bit wide; quotient register is 32-bit wide

Algorithm

32-bit divisor starts at left half of divisor register

Remainder register is initialized with the dividend at right

Quotient register is initialized to be 0

Flows of 1st Version Division

Divisor starts at left half of divisor register

Example of Division – 4 bits

44

Example : 7 / 2 = ?

7 / 2 0111 / 0010 Steps: 1 – Remainder (R) = R – D 2 – test new R (>=0 or <0) 2a - If >=0 then

R = no operation; Q = Shift left (add 1 at LSB)

2b - If <0 then R = D + R Q = Shift left (add 0 at LSB)

3 – shift D right All bits done?

If still <(max bit + 1), repeat If = (max bit+1), stop

Divisor (D)

Dividend (DD)

Quotient (Q)

Done

Test Remainder

2a. Shift the Quotient register to the left,

setting the new rightmost bit to 1

3. Shift the Divisor register right 1 bit

33rd repetition?

Start

Remainder < 0

No: < 33 repetitions

Yes: 33 repetitions

2b. Restore the original value by adding

the Divisor register to the Remainder

register and place the sum in the

Remainder register. Also shift the

Quotient register to the left, setting the

new least significant bit to 0

1. Subtract the Divisor register from the

Remainder register and place the

result in the Remainder register

Remainder > 0

1. R = R - D

If R = R – D = -ve

2b. R < 0; R = D + R Q = Shift left (add 0 at LSB)

3. D = Shift right 1

If R = R – D = +ve

2a. R >= 0; R = no operation Q = Shift left (add 1 at LSB)

4. If not yet 33 repeat to Step 1 (new iteration)

46

Iteration Step Quotient (Q) Divisor (D) Remainder (

R)

0 Initial value 0000 0010 0000 0000 0111

1

1. R = R - D 1110 0111

2b. R < 0; R = D + R 0000 0111

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0001 0000

2

1. R = R - D 1111 0111

2b. R < 0; R = D + R 0000 0111

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 1000

3

1. R = R - D 1111 1111

2b. R < 0; R = D + R 0000 0111

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 0100

Negate 0010 0000 1110 0000 0000 0111

1110 0000 + --------------- 1110 0111

R = R – D = R + (-D)

Example: 7/2 Divisor starts at left half of divisor register

47

Iteration Step Quotient (Q) Divisor (D) Remainder(R)

3

1. R = R - D 1111 1111

2b. R < 0; R = D + R 0000 0111

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 0100

4

1. R = R - D 0000 0011

2b. R >=0; R = no operation

Q = Shift left (add 1 at LSB) 0001

3. D = Shift right 0000 0010

5

1. R = R - D 0000 0001

2b. R >=0; R = no operation

Q = Shift left (add 1 at LSB) 0011

3. D = Shift right 0000 0001 3

1

Example: 7/2 = 3 remainder 1

48

Iteration Step Quotient (Q) Divisor (D) Remainder(R)

0 Initial value 0000 0011 0000 0000 0110

1

1. R = R - D 1101 0110

2b. R < 0; R = D + R 0000 0110

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0001 1000

2

1. R = R - D 1110 1110

2b. R < 0; R = D + R 0000 0110

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 1100

3

1. R = R - D 1111 1010

2b. R < 0; R = D + R 0000 0110

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 0110

Example: 6/3

49

Iteration Step Quotient (Q) Divisor (D) Remainder (R)

3

1. R = R - D 1111 1010

2b. R < 0; R = D + R 0000 0110

Q = Shift left (add 0 at LSB) 0000

3. D = Shift right 0000 0110

4

1. R = R - D 0000 0000

2b. R >=0; R = no operation

Q = Shift left (add 1 at LSB) 0001

3. D = Shift right 0000 0011

5

1. R = R - D 1111 1101

2b. R < 0; R = D + R 0000 0000

Q = Shift left (add 0 at LSB) 0010

3. D = Shift right 0000 0001 2

0

Example: 6/3 = 2

Signed Division 50

Signed divide: • make both divisor and dividend positive and perform

division • negate the quotient if divisor and dividend were of

opposite signs • make the sign of the remainder match that of the

dividend • this ensures always

• dividend = (quotient × divisor) + remainder quotient (x/y) = quotient (–x/y) e.g. 7 = 3 × 2 + 1

–7 = –3 × 2 – 1)

FLOATING POINT

51

Module 2 – Part 2

Floating Point Representation 52

Floating point aids in the representation of very big or very small fixed point numbers.

10000000000

1.0 x 1010

Fixed point

Floating point

976,000,000,000,000 9.76 x 1014

0.0000000000000976 9.76 x 10-14

Floating Point Numbers 53

736.637 x 1068

Significand/Fraction

Base/radix

Exponent

Decimal numbers use the radix 10, binary use 2

Significand and Exponent can be +ve or –ve.

Normalized and Unnormalized 55

In generalized normalization (like in mathematics), a floating point number is said to be normalized if the number after the radix point is a non-zero value.

Un-normalized floating number is when the number after the radix point is ‘0’.

Example:

normalized

0.1234 x 1016

0.0011 x 1015 11.0123 x 1011 0.133 x 105

unnormalized

number after the radix point is a non-zero value.

normalized

unnormalized

Normalization Process 56

Normalization is the process of deleting the zeroes until a non-zero value is detected.

Example :

A rule of thumb:

moving the radix point to the right subtract exponent

moving the radix point to the left add exponent

0.00234 x 104

Move radix point to the right (in this case 2 points)

0.234 x 104-2 0.234 x 102

12.0024 x 104

Move radix point to the left (in this case 2 points)

0.120024 x 104+2 0.234 x 106

Floating Point Format for Binary Numbers

In the beginning, the general form of floating point is :

In binary:

+/- 0.Mantissa x r +/- exponent

+/- (sign) Exponent +/- (sign) Mantissa

1 word

Mantissa = Significand

The 2 sign bits are not good for design as it incurs extra costs

Biased Exponent 59

A new value to represent the exponent without the sign bit is introduced

This will eliminate the sign for the exponent value that is the exponent will be positive. (indicative)

+/- (sign) Biased Exponent Mantissa

1 word

+/- (1 bit) Et (n bit) Mantissa, m

Biased Exponent 60

Biased value, b = 2n-1

+/- (1 bit) Et (n bit) Mantissa, m

Normalized exponent, e’ = Et - b

Where, Et = biased exponent n = bits of exponent format (i.e. the word format)

Biased exponent, Et = e’ + b

This is used unless the IEEE standard is mentioned – then is a different calculation

Conversion to Floating Point Number 61

Change to binary (if given decimal number)

Normalized the number

Change the number to biased exponent

Form the word (3 fields)

Example 3 62

Transform -33.625 to floating point word using the following format (radix 2)

Step 1 : Change 33.625 to binary

1 bit 4 bit 12 bit

Sign Biased exponent Significand

This is the given word format

33 0100001

0.625 x 2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1.0 1 0.0

0.625 = .101

33.625 = 0100001.101

= 0.100001101 x 2 0110

Example 3 63

Step 2 : Normalized the number

Step 3: Change the number to biased exponent

0100001.101 0.100001101 x 2 0110

Normalized exponent, e’

1 bit 4 bit 12 bit

Biased value, b = 2n-1 = 24-1 = 8d = 1000b

Biased exponent, Et = e’ + b = 0110 + 1000 = 1110

0.100001101 x 2 0110 0.100001101 x 2 1110

Example 3 64

Step 4 : Form the word (3 fields)

1 1110 100001101000

0.100001101 x 2 1110

1 bit 4 bit 12 bit Padding

Rule of thumb: -the biased exponent is always padded to the left - the significand (or mantissa) is always padded to the right

Floating-Point Representation 65

65

Value in general form: (-1) S x F x 2e

In an 8-bit representation, we can represent:

From 2 x 10-38 to 2 x 1038

This is called single-precision = 1 word

If anything goes above or under, then overflow and underflow happens respectively.

One way to reduce this is to offer another format with a larger exponent use double precision (2 words)

From 2 x 10-308 to 2 x 10308

IEEE 754 Floating-Point Standard 69

This 1 is made implicit to pack more bits into the significand

Normalized Scientific Notation 70

In IEEE standard normalization (used in computers), a floating point number is said to be normalized if there is only a single non-zero before the radix point.

Example:

123.456

there is only a single non-zero before the radix point.

normalized 1.23456 x 102

1010.1011B normalized 1.0101011 x 2011

Challenge of Negative Exponents 72

Placing the exponent before the significand simplifies sorting of floating-point numbers using integer comparison instructions.

However, using 2’s complement in the exponent field makes a negative exponent look like a big number.

sign Exponent Significand

1 bit 8 bit 23 bit

Biased Notation 73

Bias In single precision is 127 In double precision 1023

Biased notation (-1)sign x (1 + Fraction) x 2 (exponent-bias)

74

To convert a decimal number to single (or double) precision floating point:

Step 1: Normalize

Step 2: Determine Sign Bit

Step 3: Determine exponent

Step 4: Determine Significand

IEEE 754 Conversion

75

Convert 10.4d to single precision floating point.

Step 1: Normalize

IEEE 754 Conversion : Example 1

10 00001010

0.4 x 2 = 0.8 0 0.8 x 2 = 1.6 1 0.6 x 2 = 1.2 1 0.2 x 2 = 0.4 0 0.4 x 2 = 0.8 0 0.8 x 2 = 1.6 1

0.4 = .0110 10.4 = 1010.0110 x 20

For continuous results, take the 1st pattern before it repeats itself

1.0100110 x 23

76

Step 2: Determine Sign Bit (S) Because (10.4) is positive, S = 0

Step 3: Determine exponent Because its single precision bias = 127

Exponent = 3 + bias = 3 + 127 = 130d = 1000 0010b

Step4: Determine Significand

Drop the leading 1 of the significand

Then expand (padding) to 23 bits

76

IEEE 754 Conversion : Example 1

3 is from 23

1.0100110 x 23 0100110

01001100000000000000000

sign Exponent Significand

0 10000010 01001100000000000000000

77

Convert -0.75d to single precision floating point.

Step 1: Normalize

IEEE 754 Conversion : Example 2

0.75 x 2 = 1.5 1 0.5 x 2 = 1.0 1 0.0 x 2 = 0 0

- 0.75 = - 0.11

-0.11 x 20

-1.1 x 2-1

78

Step 2: Determine Sign Bit (S) Because (-0.75) is negative, S = 1

Step 3: Determine exponent Because its single precision bias = 127

Exponent = -1 + bias = -1 + 127 = 126d = 01111110b

Step4: Determine Significand

Drop the leading 1 of the significand

Then expand (padding) to 23 bits

78

IEEE 754 Conversion : Example 2

-1.1 x 2-1 0.1

10000000000000000000000

sign Exponent Significand

1 01111110 10000000000000000000000

79

Convert -0.75d to double precision floating point.

Step 1: Normalize

IEEE 754 Conversion : Example 3

0.75 x 2 = 1.5 1 0.5 x 2 = 1.0 1 0.0 x 2 = 0 0

- 0.75 = - 0.11

-0.11 x 20

-1.1 x 2-1

80

Step 2: Determine Sign Bit (S) Because (-0.75) is negative, S = 1

Step 3: Determine exponent Because its double precision bias = 1023

Exponent = -1 + bias = -1 + 1023 = 1022d = 01111111110b

Step4: Determine Significand

Drop the leading 1 of the significand

Then expand (padding) to 52 bits

80

IEEE 754 Conversion : Example 3

-1.1 x 2-1 0.1

10000000000000000000000……..00

sign Exponent (11) Significand (52)

1 01111111110 1000000000000000000..00

Converting Binary to Decimal Floating-Point

81

What decimal number is represented by this single precision float?

Extract the values:

Sign = 1

Exponent = 10000001b = 129d

Significand

The Fraction

Sign (1 bit) Exponent(8 bit) Significand(23 bit)

1 10000001 01000000000000000000000

= (0 x 2-1) + (1 x 2-2) + (0 x 2-3) = ¼ = 0.25

The number = - (1.25 x 2 (exponent-bias) ) = - (1.25 x 22) = - (1.25 x 4) = -5.0

Remember: Biased notation (-1)sign x (1 + Fraction) x 2 (exponent-bias)

-(1 + 0.25)

FLOATING-POINT OPERATIONS

82

Module 2 – Part 3

Floating-Point Addition Flows 83

Decimal Floating-Point Addition 84

• Assume 4 decimal digits for significand and 2 decimal digits for exponent

• Step 1: Align the decimal point of the number that has the smaller exponent

• Step 2: add the significand

• Step 3: Normalize the sum • check for overflow/underflow of the exponent after

normalisation

• Step 4: Round the significand • If the significand does not fit in the space reserved for it, it has to

be rounded off

• Step 5: Normalize it (if need be)

Decimal Floating-Point Addition 85

• Step 1: Align the decimal point of the number that has the smaller exponent

• Step 2: add the significand

Example: 9.999d x 101 + 1.610d x 10-1

Make 1.610d x 10-1 to 101

-1 + x = 1 x = 2 move 2 to left 0.0161d x 101

9.9990 x 101

+ 0.0161d x 101

----------------------- 10.0151 x 101

Decimal Floating-Point Addition 86

• Step 3: Normalize the sum

• Step 4: Round the significand (to 4 decimal digits for significand)

• Step 5: Normalize it (if need be)

Example: 9.999d x 101 + 1.610d x 10-1

10.0151 x 101 1.00151 x 102

1.00151 x 102 1.0015 x 102

No need as its normalized

Decimal Floating-Point Addition 88

• Adjusts the numbers

• Step 1: Align the decimal point of the number that has the smaller exponent

0.5 0.10b x 20 1.0b x 2-1

Example: 0.5d + (-0.4375d )

-0.4375 -0.0111b x 20 -1.11b x 2-2

Make 1.11b x 2-2 to 2-1

-2 + x = -1 x = 1 move 1 to left - 0.111b x 2-1

Decimal Floating-Point Addition 89

• Step 2: add the significand

• Step 3: Normalize the sum

• Step 4: Round the significand (to 4 decimal digits for significand)

• Step 5: Normalize it (if need be)

1.000 x 2-1

+ -0.111 x 2-1

---------------------- 0.001 × 2-1

Example: 0.5d + (-0.4375d )

0.001 × 2-1 1.000 × 2-3

Fits in the 4 decimal digits

No need as its normalized

Floating-Point Multiplication Flows 90

Floating-Point Multiplication 91

• Assume 4 decimal digits for significand and 2 decimal digits for exponent

• Step 1: Add the exponent of the 2 numbers

• Step 2: Multiply the significands

• Step 3: Normalize the product • check for overflow/underflow of the exponent after

normalisation

• Step 4: Round the significand • If the significand does not fit in the space reserved for it, it has to

be rounded off

• Step 5: Normalize it (if need be)

• Step 6: Set the sign of the product

Floating-Point Multiplication 92

• Assume 4 decimal digits for significand and 2 decimal digits for exponent

• Step 1: Add the exponent of the 2 numbers

• Step 2: Multiply the significands

Example: (1.110d x 1010 ) x (9.200d x 10-5)

10 + (-5) = 5 If biased is considered 10 + (-5) + 127 = 132

9.200 x 1.110 -------------- 92000 9200 9200 -------------- 10212000

10.212000 10.2120 x 105

Floating-Point Multiplication 93

• Step 3: Normalize the product

• Step 4: Round the significand (4 decimal digits for significand)

• Step 5: Normalize it (if need be)

• Step 6: Set the sign of the product

Example: (1.110d x 1010 ) x (9.200d x 10-5)

10.2120 x 105 1.02120 x 106

1.0212 x 106

Still normalized

+1.0212 x 106

Floating-Point Multiplication 95

• Assume 4 decimal digits for significand and 2 decimal digits for exponent

• Step 1: Add the exponent of the 2 numbers

• Step 2: Multiply the significands

-1 + (-2) = -3 If biased is considered -1 + (-2) + 127 = 124

1.110 x 1.000 -------------- 1110000 1.110000 1.110000 x 10-3

Example: (1.000b x 2-1 ) x (-1.110b x 2-2)

Floating-Point Multiplication 96

• Step 3: Normalize the product

• Step 4: Round the significand (4 decimal digits for significand)

• Step 5: Normalize it (if need be)

• Step 6: Set the sign of the product

Example: (1.000b x 2-1 ) x (-1.110b x 2-2

1.110000 x 10-3 already normalized

1.1100 x 10-3

Still normalized

-1.1100b x 10-3 -7/32 d

Floating-Point ALU

97

010 1 0 1

Control

Small ALU

Big ALU

Sign Exponent Significand Sign Exponent Significand

Exponent difference

Shift right

Shift left or right

Rounding hardware

Sign Exponent Significand

Increment or decrement

0 10 1

Shift smaller

number right

Compare

exponents

Add

Normalize

Round

Accurate Arithmetic 98

If a calculation exceeds the limits of the floating point scheme then CPU will flag this error.

If number is too tiny to be represented

Accurate Arithmetic : Truncation & Rounding

99

Some number have infinite decimal points (the irrational numbers) 1/3d = 0.3333333333

Truncation is done to fit the decimal points into manageable units. Truncation is where decimal values beyond the truncation point are

simply discarded and it can cause error in floating point calculations.

Rounding :If you have a number such as 3.456 then if you have to round this to 3 significant digits, the number becomes 3.46 A small error called the rounding error has occurred

***Note : the CPU will not flag any error when truncation and rounding occurs, as it is acting within its limits. programmers must assess if this will lead to significant errors