Module 2 - comp.utm.my · Module 2: Computer Arithmetic . ... Control test Multiplier S hif trg...
Transcript of Module 2 - comp.utm.my · Module 2: Computer Arithmetic . ... Control test Multiplier S hif trg...
B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N , 3 E D , D A V I D L . P A T T E R S O N A N D
J O H N L . H A N N E S S Y , M O R G A N K A U F M A N N P U B L I S H E R S
1
Module 2: Computer Arithmetic
Introduction 3
Numbers are represented by binary digits (bits):
How are negative numbers represented?
What is the largest number that can be represented in a computer world?
What happens if an operation creates a number bigger than can be represented?
What about fractions and real numbers?
A mystery: How does hardware really multiply or divide numbers?
Binary Numbers 5
Binary Number System
System Digits: 0 and 1
Bit (short for binary digit): A single binary digit
LSB (least significant bit): The rightmost bit
MSB (most significant bit): The leftmost bit
Upper Byte (or nybble): The right-hand byte (or nybble) of a pair
Lower Byte (or nybble): The left-hand byte (or nybble) of a pair
Binary Numbers 6
Binary Equivalents
1 Nybble (or nibble) = 4 bits
1 Byte = 2 nybbles = 8 bits
1 Kilobyte (KB) = 1024 bytes
1 Megabyte (MB) = 1024 kilobytes = 1,048,576 bytes
1 Gigabyte (GB) = 1024 megabytes = 1,073,741,824 bytes
Binary Addition 8
Rules of Binary Addition
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0, and carry 1 to the next more significant bit
Example: 26 +12 Carry
Binary Subtraction 9
Rules of Binary Subtraction
0 - 0 = 0
0 - 1 = 1, and borrow 1 from the next more significant bit
1 - 0 = 1
1 - 1 = 0
Example: 37 -17
borrowed
Binary Multiplication 10
Rules of Binary Multiplication
0 x 0 = 0
0 x 1 = 0
1 x 0 = 0
1 x 1 = 1,
and no carry or borrow bits
Example: 23 x 3
Another Method: Binary multiplication is the same as repeated binary addition
2’s Complement 12
Two's complement representation allows the use of binary
arithmetic operations on signed integers, yielding the correct 2's
complement results.
Positive Numbers: Positive 2's complement numbers are
represented as the simple binary.
Negative Numbers: Negative 2's complement numbers are
represented as the binary number that when added to a positive
number of the same magnitude, will equals zero.
13
To represent positive and negative numbers look at the MSB (or the sign bit) •MSB = 0 means positive •MSB = 1 means negative
Calculation of 2's Complement 14
Step 1:
invert the binary equivalent of the number by changing all of the ones to zeroes and all of the zeroes to ones (also called 1's complement)
Step 2:
Then add one.
Example: -17
17 = 00010001
Step1 : 11101110
Step2 : 11101110 + 1 ----------------------- 11101111 -17 = 11101111
2's Complement Addition 15
Two's complement addition follows the same rules as binary addition.
Example: 5 + (-3) 5 = 00000101
00000101 + 11111101 ----------------------- 1 00000010 (2)
-3 = 11111101
Ignore
2's Complement Addition 16
Two's complement addition follows the same rules as binary addition.
Example: 3 + (-5) 3 = 00000011
00000011 + 11111011 ----------------------- 11111110 (-2)
-5 = 11111011
2's Complement Subtraction 17
Two's complement subtraction is the binary addition of the minuhend to the 2's complement of the subtrahend (adding a negative number is the same as subtracting a positive one).
Example : 7 – 12 7 + (-12)
minuhend subtrahend
2's Complement Subtraction 18
Example : 7 – 12 7 + (-12)
7 = 00000111
00000111 + 11110100 ----------------------- 11111011 (-5)
-12 = 11110100
-5 = 11111011
Step1 : 00000100
Step2 : 00000100 + 1 ----------------------- 00000101 (5)
CHECK!
2's Complement Multiplication 19
Two's complement multiplication follows the same rules as binary multiplication.
Example : (-4) × 4 = (-16) 4 = 00000100
11111100 x 00000100 ----------------------- 11 11110000 (-16)
-4 = 11111100
-16 = 11110000
Step1 : 00001111
Step2 : 00001111 + 1 ----------------------- 00010000 (16)
CHECK!
Ignore
2's Complement Division 20
Two's complement division is repeated 2's complement subtraction.
The 2's complement of the divisor is calculated, then added to the dividend.
For the next subtraction cycle, the quotient replaces the dividend.
This repeats until the quotient is too small for subtraction or is zero, then it becomes the remainder.
The final answer is the total of subtraction cycles plus the remainder
Example : 6/ 3 = 2
dividend
divisor
quotient
2's Complement Division 21
Example: 6/3 Example: 7/3
6 + (-3) = 3
3 + (-3) = 0
Cycle 1
Cycle 2
6 / 3 = 2
The number of cycle
7 + (-3) = 4
4 + (-3) = 1
Cycle 1
Cycle 2
7 / 3 = 2 remainder 1
2's Complement Division 22
Example: 42/6
42 + (-6) = 36
36 + (-6) = 30
Cycle 1
Cycle 2
42 / 6 = 7
30 + (-6) = 24
24 + (-6) = 18
Cycle 3
Cycle 4
18 + (-6) = 12 Cycle 5
12 + (-6) = 6
6 + (-6) = 0
Cycle 6
Cycle 7
Sign Extension 23
Extending a number representation to a larger number of bits.
Example: 2 in 8 bit binary to 16 bit binary.
In signed numbers, it is important to extend the sign bit to preserve the number (+ve or –ve)
Example: -2 in 8 bit binary to 16 bit binary.
00000010 00000000 00000010
11111110 11111111 11111110
Sign bit Sign bit Sign bit extended
Detecting Overflow in Two Complement Numbers
24
Overflow occurs when adding two positive numbers and the sum is negative, or vice versa
A carry out occurred into the sign bit
Overflow conditions for addition and subtraction
Overflow Rule for addition 25
If 2 Two's Complement numbers are added, and they both have the same sign (both positive or both negative), then overflow occurs if and only if the result has the opposite sign. Adding two positive numbers must give a positive result
Adding two negative numbers must give a negative result
Overflow never occurs when adding operands with different signs.
Overflow occurs if
(+A) + (+B) = −C
(−A) + (−B) = +C
Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)
(-7) + (-6) = (-13) but Overflow (largest −ve number is −8)
-7 = 1001
1001 + 1010 ------------ 1 0011 (3)
-6 = 1010
Overflow occurs
The sign bit has changed to +ve
Overflow Rule for Subtraction 26
If 2 Two's Complement numbers are subtracted, and their signs are different, then overflow occurs if and only if the result has the same sign as the subtrahend.
Overflow occurs if (+A) − (−B) = −C
(−A) − (+B) = +C
Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)
Subtract −6 from +7 (i.e. 7 – (-6))
result has the same sign as the subtrahend overflow happens
subtrahend
result
Overflow Rule for Subtraction 27
If 2 Two's Complement numbers are subtracted, and their signs are different, then overflow occurs if and only if the result has the same sign as the subtrahend.
Overflow occurs if (+A) − (−B) = −C
(−A) − (+B) = +C
Example: Using 4-bit Two's Complement numbers (−8 ≤ x ≤ +7)
Subtract −6 from +7 (i.e. 7 – (-6))
7 = 0111
0111 + 0110 ------------ 1101 (-3)
6 = 0110 Overflow
occurs
Result has same sign as
subtrahend
A little summary 28
Addition
Add the values, discarding any carry-out bit
Subtraction
Negate the subtrahend and add, discarding any carry-out bit
Overflow
Occurs when adding two positive numbers produces a negative result, or when adding two negative numbers produces a positive result. Adding operands of unlike signs never produces an overflow
Notice that discarding the carry out of the most significant bit during Two's Complement addition is a normal occurrence, and does not by itself indicate overflow
As an example of overflow, consider adding (80 + 80 = 160)10, which produces a result of −9610 in 8-bit two's complement.
Not 160 because the sign bit is 1. (largest +ve number in 8 bits is 127)
01010000 + 01010000 ------------------ 1010000 (-96)
Two's Complement Multiplication 30
There are a couple of ways of doing two's complement multiplication by hand.
Remember that the result can require 2 times as many bits as the original operands.
“Sign Extend Method" for Two's Complement Multiplication
31
In 2's complement, sign extend both integers to twice as many bits.
Then take the correct number of result bits from the least significant portion of the result
A 4-bit example: 2 x (-3)
2= 0010
00000010 x 11111101 --------------------------- 00000010 00000000 00000010 00000010 00000010 00000010 00000010 00000010 --------------------------- 000000111111010
-3 = 1111 1101
2= 0000 0010
Sign extend to 8 bit
Correct answer underlined -6
" Partial Product Sign Extend Method " for Two's Complement Multiplication
32
Another way is to sign extend the partial products to the correct number of bits.
Sometimes we do have to make some adjustments.
Example 1: (-4) x 3
-4 = 1100
1100 x 0011 --------------------------- 11111100 1111100 000000 --------------------------- 11110100 (-12)
3= 0011
Sign extend to the correct number of bits 8
" Partial Product Sign Extend Method " for Two's Complement Multiplication
33
Sometimes we do have to make some adjustments.
If (+ve) x (+ve) then OK
If (-ve) x (+ve) then Sign extend partial products
If (+ve) x (-ve) then get additive inverse of both And then Sign extend partial products Example: 3 x (-4)
If (-ve) x (-ve) then get additive inverse of both
-3=1101; 4 =0100
1101 x 0100 --------------------------- 00000000 0000000 111101 --------------------------- 11110100 (-12)
3=0011 ;-4 = 1100
Normal stuff Like the slide before (-4)x3
" Partial Product Sign Extend Method " for Two's Complement Multiplication
34
Sometimes we do have to make some adjustments.
If (+ve) x (+ve) then OK
If (-ve) x (+ve) then Sign extend partial products
If (+ve) x (-ve) then get additive inverse of both And then Sign extend partial products Example: 3 x (-4)
If (-ve) x (-ve) then get additive inverse of both Example: (-3)x(-3)
3 = 0011
0011 x 0011 --------------------------- 0011 0011 --------------------------- 01001 (9)
-3=1101
Signed Multiplication 35
Another way to deal with signed numbers.
First convert the multiplier and multiplicand to positive numbers and then remember the original signs
Leave the sign out of the calculation
To negate the product only if the original signs disagree
1st Version of Multiplication Hardware
Flows of 1st Version Multiplication
64-bit ALU
Control test
Multiplier
Shift right
Product
Write
Multiplicand
Shift left
64 bits
64 bits
32 bits
Done
1. Test
Multiplier0
1a. Add multiplicand to product and
place the result in Product register
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Multiplicand register, product register, ALU are 64-bit wide; multiplier register is 32-bit wide
Algorithm
32-bit multiplicand starts at right half of multiplicand register
Product register is initialized at 0
Actual implementations are far more complex, and use algorithms that generate more than one bit of product each clock cycle.
Example of Multiplication – 4 bits 37
Example : 2 x 3 = ?
2 x 3 0010 x 0011
Steps:
1a – test multiplier (0 or 1) If 1 then P = P + MC
If 0 then no operation
2 – shift MC left
3 – shift MP right
All bits done? If still <max bit, repeat
If = max bit, stop
Multiplier (MP)
Multiplicand (MC)
Product (P)
Done
1. Test
Multiplier0
1a. Add multiplicand to product and
place the result in Product register
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
0011
P = P + MC = 00000000 + 00000010 = 00000010
MC = 00000010 00000100
MP = 0011 0001
Max bit? NO repeat
0001
38
Iteration Step Multiplier
(MP) Multiplicand
(MC) Product (P)
0 Initial value 0011 0000 0010 0000 0000
1
1a:1P = P + MC
2: Shift MC left
3: Shift MP right
2
1a:1P = P + MC
2: Shift MC left
3: Shift MP right
3
1a:0no operation
2: Shift MC left
3: Shift MP right
4
1a:0no operation
2: Shift MC left
3: Shift MP right
0000 0010
0000 0100
0001
0000 0110
0000 1000
0000
0001 0000
0000
0010 0000
0000
Try with 5 x 4
40
Iteration Step Multiplier
(MP) Multiplicand
(MC) Product (P)
0 Initial value 0100 0000 0101 0000 0000
1
1a:0no operation
2: Shift MC left 0000 1010
3: Shift MP right 0010
2
1a:0no operation
2: Shift MC left 0001 0100
3: Shift MP right 0001
3
1a:1P = P + MC 0001 0100
2: Shift MC left 0010 1000
3: Shift MP right 0000
4
1a:0no operation
2: Shift MC left 0101 000
3: Shift MP right 0000
Example : 5 x 4
Try with 2 x (-3)
20
41
Iteration Step Multiplier
(MP) Multiplicand
(MC) Product (P)
0 Initial value 0011 1111 1110 0000 0000
1
1a:1P = P + MC 1111 1110
2: Shift MC left 1111 1100
3: Shift MP right 0001
2
1a:1P = P + MC 1111 1010
2: Shift MC left 1111 1000
3: Shift MP right 0000
3
1a:0no operation
2: Shift MC left 1111 0000
3: Shift MP right 0000
4
1a:0no operation
2: Shift MC left 1110 0000
3: Shift MP right 0000
Example : 2 x (-3) get additive inverse of both
-6
1st Version of Division
Hardware 43
64-bit ALU
Control
test
Quotient
Shift left
Remainder
Write
Divisor
Shift right
64 bits
64 bits
32 bits
Done
Test Remainder
2a. Shift the Quotient register to the left,
setting the new rightmost bit to 1
3. Shift the Divisor register right 1 bit
33rd repetition?
Start
Remainder < 0
No: < 33 repetitions
Yes: 33 repetitions
2b. Restore the original value by adding
the Divisor register to the Remainder
register and place the sum in the
Remainder register. Also shift the
Quotient register to the left, setting the
new least significant bit to 0
1. Subtract the Divisor register from the
Remainder register and place the
result in the Remainder register
Remainder > 0
–
Divisor register, remainder register, ALU are 64-bit wide; quotient register is 32-bit wide
Algorithm
32-bit divisor starts at left half of divisor register
Remainder register is initialized with the dividend at right
Quotient register is initialized to be 0
Flows of 1st Version Division
Divisor starts at left half of divisor register
Example of Division – 4 bits
44
Example : 7 / 2 = ?
7 / 2 0111 / 0010 Steps: 1 – Remainder (R) = R – D 2 – test new R (>=0 or <0) 2a - If >=0 then
R = no operation; Q = Shift left (add 1 at LSB)
2b - If <0 then R = D + R Q = Shift left (add 0 at LSB)
3 – shift D right All bits done?
If still <(max bit + 1), repeat If = (max bit+1), stop
Divisor (D)
Dividend (DD)
Quotient (Q)
Done
Test Remainder
2a. Shift the Quotient register to the left,
setting the new rightmost bit to 1
3. Shift the Divisor register right 1 bit
33rd repetition?
Start
Remainder < 0
No: < 33 repetitions
Yes: 33 repetitions
2b. Restore the original value by adding
the Divisor register to the Remainder
register and place the sum in the
Remainder register. Also shift the
Quotient register to the left, setting the
new least significant bit to 0
1. Subtract the Divisor register from the
Remainder register and place the
result in the Remainder register
Remainder > 0
–
1. R = R - D
If R = R – D = -ve
2b. R < 0; R = D + R Q = Shift left (add 0 at LSB)
3. D = Shift right 1
If R = R – D = +ve
2a. R >= 0; R = no operation Q = Shift left (add 1 at LSB)
4. If not yet 33 repeat to Step 1 (new iteration)
46
Iteration Step Quotient (Q) Divisor (D) Remainder (
R)
0 Initial value 0000 0010 0000 0000 0111
1
1. R = R - D 1110 0111
2b. R < 0; R = D + R 0000 0111
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0001 0000
2
1. R = R - D 1111 0111
2b. R < 0; R = D + R 0000 0111
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 1000
3
1. R = R - D 1111 1111
2b. R < 0; R = D + R 0000 0111
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 0100
Negate 0010 0000 1110 0000 0000 0111
1110 0000 + --------------- 1110 0111
R = R – D = R + (-D)
Example: 7/2 Divisor starts at left half of divisor register
47
Iteration Step Quotient (Q) Divisor (D) Remainder(R)
3
1. R = R - D 1111 1111
2b. R < 0; R = D + R 0000 0111
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 0100
4
1. R = R - D 0000 0011
2b. R >=0; R = no operation
Q = Shift left (add 1 at LSB) 0001
3. D = Shift right 0000 0010
5
1. R = R - D 0000 0001
2b. R >=0; R = no operation
Q = Shift left (add 1 at LSB) 0011
3. D = Shift right 0000 0001 3
1
Example: 7/2 = 3 remainder 1
48
Iteration Step Quotient (Q) Divisor (D) Remainder(R)
0 Initial value 0000 0011 0000 0000 0110
1
1. R = R - D 1101 0110
2b. R < 0; R = D + R 0000 0110
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0001 1000
2
1. R = R - D 1110 1110
2b. R < 0; R = D + R 0000 0110
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 1100
3
1. R = R - D 1111 1010
2b. R < 0; R = D + R 0000 0110
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 0110
Example: 6/3
49
Iteration Step Quotient (Q) Divisor (D) Remainder (R)
3
1. R = R - D 1111 1010
2b. R < 0; R = D + R 0000 0110
Q = Shift left (add 0 at LSB) 0000
3. D = Shift right 0000 0110
4
1. R = R - D 0000 0000
2b. R >=0; R = no operation
Q = Shift left (add 1 at LSB) 0001
3. D = Shift right 0000 0011
5
1. R = R - D 1111 1101
2b. R < 0; R = D + R 0000 0000
Q = Shift left (add 0 at LSB) 0010
3. D = Shift right 0000 0001 2
0
Example: 6/3 = 2
Signed Division 50
Signed divide: • make both divisor and dividend positive and perform
division • negate the quotient if divisor and dividend were of
opposite signs • make the sign of the remainder match that of the
dividend • this ensures always
• dividend = (quotient × divisor) + remainder quotient (x/y) = quotient (–x/y) e.g. 7 = 3 × 2 + 1
–7 = –3 × 2 – 1)
Floating Point Representation 52
Floating point aids in the representation of very big or very small fixed point numbers.
10000000000
1.0 x 1010
Fixed point
Floating point
976,000,000,000,000 9.76 x 1014
0.0000000000000976 9.76 x 10-14
Floating Point Numbers 53
736.637 x 1068
Significand/Fraction
Base/radix
Exponent
Decimal numbers use the radix 10, binary use 2
Significand and Exponent can be +ve or –ve.
Normalized and Unnormalized 55
In generalized normalization (like in mathematics), a floating point number is said to be normalized if the number after the radix point is a non-zero value.
Un-normalized floating number is when the number after the radix point is ‘0’.
Example:
normalized
0.1234 x 1016
0.0011 x 1015 11.0123 x 1011 0.133 x 105
unnormalized
number after the radix point is a non-zero value.
normalized
unnormalized
Normalization Process 56
Normalization is the process of deleting the zeroes until a non-zero value is detected.
Example :
A rule of thumb:
moving the radix point to the right subtract exponent
moving the radix point to the left add exponent
0.00234 x 104
Move radix point to the right (in this case 2 points)
0.234 x 104-2 0.234 x 102
12.0024 x 104
Move radix point to the left (in this case 2 points)
0.120024 x 104+2 0.234 x 106
Floating Point Format for Binary Numbers
In the beginning, the general form of floating point is :
In binary:
+/- 0.Mantissa x r +/- exponent
+/- (sign) Exponent +/- (sign) Mantissa
1 word
Mantissa = Significand
The 2 sign bits are not good for design as it incurs extra costs
Biased Exponent 59
A new value to represent the exponent without the sign bit is introduced
This will eliminate the sign for the exponent value that is the exponent will be positive. (indicative)
+/- (sign) Biased Exponent Mantissa
1 word
+/- (1 bit) Et (n bit) Mantissa, m
Biased Exponent 60
Biased value, b = 2n-1
+/- (1 bit) Et (n bit) Mantissa, m
Normalized exponent, e’ = Et - b
Where, Et = biased exponent n = bits of exponent format (i.e. the word format)
Biased exponent, Et = e’ + b
This is used unless the IEEE standard is mentioned – then is a different calculation
Conversion to Floating Point Number 61
Change to binary (if given decimal number)
Normalized the number
Change the number to biased exponent
Form the word (3 fields)
Example 3 62
Transform -33.625 to floating point word using the following format (radix 2)
Step 1 : Change 33.625 to binary
1 bit 4 bit 12 bit
Sign Biased exponent Significand
This is the given word format
33 0100001
0.625 x 2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1.0 1 0.0
0.625 = .101
33.625 = 0100001.101
= 0.100001101 x 2 0110
Example 3 63
Step 2 : Normalized the number
Step 3: Change the number to biased exponent
0100001.101 0.100001101 x 2 0110
Normalized exponent, e’
1 bit 4 bit 12 bit
Biased value, b = 2n-1 = 24-1 = 8d = 1000b
Biased exponent, Et = e’ + b = 0110 + 1000 = 1110
0.100001101 x 2 0110 0.100001101 x 2 1110
Example 3 64
Step 4 : Form the word (3 fields)
1 1110 100001101000
0.100001101 x 2 1110
1 bit 4 bit 12 bit Padding
Rule of thumb: -the biased exponent is always padded to the left - the significand (or mantissa) is always padded to the right
Floating-Point Representation 65
65
Value in general form: (-1) S x F x 2e
In an 8-bit representation, we can represent:
From 2 x 10-38 to 2 x 1038
This is called single-precision = 1 word
If anything goes above or under, then overflow and underflow happens respectively.
One way to reduce this is to offer another format with a larger exponent use double precision (2 words)
From 2 x 10-308 to 2 x 10308
Normalized Scientific Notation 70
In IEEE standard normalization (used in computers), a floating point number is said to be normalized if there is only a single non-zero before the radix point.
Example:
123.456
there is only a single non-zero before the radix point.
normalized 1.23456 x 102
1010.1011B normalized 1.0101011 x 2011
Challenge of Negative Exponents 72
Placing the exponent before the significand simplifies sorting of floating-point numbers using integer comparison instructions.
However, using 2’s complement in the exponent field makes a negative exponent look like a big number.
sign Exponent Significand
1 bit 8 bit 23 bit
Biased Notation 73
Bias In single precision is 127 In double precision 1023
Biased notation (-1)sign x (1 + Fraction) x 2 (exponent-bias)
74
To convert a decimal number to single (or double) precision floating point:
Step 1: Normalize
Step 2: Determine Sign Bit
Step 3: Determine exponent
Step 4: Determine Significand
IEEE 754 Conversion
75
Convert 10.4d to single precision floating point.
Step 1: Normalize
IEEE 754 Conversion : Example 1
10 00001010
0.4 x 2 = 0.8 0 0.8 x 2 = 1.6 1 0.6 x 2 = 1.2 1 0.2 x 2 = 0.4 0 0.4 x 2 = 0.8 0 0.8 x 2 = 1.6 1
0.4 = .0110 10.4 = 1010.0110 x 20
For continuous results, take the 1st pattern before it repeats itself
1.0100110 x 23
76
Step 2: Determine Sign Bit (S) Because (10.4) is positive, S = 0
Step 3: Determine exponent Because its single precision bias = 127
Exponent = 3 + bias = 3 + 127 = 130d = 1000 0010b
Step4: Determine Significand
Drop the leading 1 of the significand
Then expand (padding) to 23 bits
76
IEEE 754 Conversion : Example 1
3 is from 23
1.0100110 x 23 0100110
01001100000000000000000
sign Exponent Significand
0 10000010 01001100000000000000000
77
Convert -0.75d to single precision floating point.
Step 1: Normalize
IEEE 754 Conversion : Example 2
0.75 x 2 = 1.5 1 0.5 x 2 = 1.0 1 0.0 x 2 = 0 0
- 0.75 = - 0.11
-0.11 x 20
-1.1 x 2-1
78
Step 2: Determine Sign Bit (S) Because (-0.75) is negative, S = 1
Step 3: Determine exponent Because its single precision bias = 127
Exponent = -1 + bias = -1 + 127 = 126d = 01111110b
Step4: Determine Significand
Drop the leading 1 of the significand
Then expand (padding) to 23 bits
78
IEEE 754 Conversion : Example 2
-1.1 x 2-1 0.1
10000000000000000000000
sign Exponent Significand
1 01111110 10000000000000000000000
79
Convert -0.75d to double precision floating point.
Step 1: Normalize
IEEE 754 Conversion : Example 3
0.75 x 2 = 1.5 1 0.5 x 2 = 1.0 1 0.0 x 2 = 0 0
- 0.75 = - 0.11
-0.11 x 20
-1.1 x 2-1
80
Step 2: Determine Sign Bit (S) Because (-0.75) is negative, S = 1
Step 3: Determine exponent Because its double precision bias = 1023
Exponent = -1 + bias = -1 + 1023 = 1022d = 01111111110b
Step4: Determine Significand
Drop the leading 1 of the significand
Then expand (padding) to 52 bits
80
IEEE 754 Conversion : Example 3
-1.1 x 2-1 0.1
10000000000000000000000……..00
sign Exponent (11) Significand (52)
1 01111111110 1000000000000000000..00
Converting Binary to Decimal Floating-Point
81
What decimal number is represented by this single precision float?
Extract the values:
Sign = 1
Exponent = 10000001b = 129d
Significand
The Fraction
Sign (1 bit) Exponent(8 bit) Significand(23 bit)
1 10000001 01000000000000000000000
= (0 x 2-1) + (1 x 2-2) + (0 x 2-3) = ¼ = 0.25
The number = - (1.25 x 2 (exponent-bias) ) = - (1.25 x 22) = - (1.25 x 4) = -5.0
Remember: Biased notation (-1)sign x (1 + Fraction) x 2 (exponent-bias)
-(1 + 0.25)
Decimal Floating-Point Addition 84
• Assume 4 decimal digits for significand and 2 decimal digits for exponent
• Step 1: Align the decimal point of the number that has the smaller exponent
• Step 2: add the significand
• Step 3: Normalize the sum • check for overflow/underflow of the exponent after
normalisation
• Step 4: Round the significand • If the significand does not fit in the space reserved for it, it has to
be rounded off
• Step 5: Normalize it (if need be)
Decimal Floating-Point Addition 85
• Step 1: Align the decimal point of the number that has the smaller exponent
• Step 2: add the significand
Example: 9.999d x 101 + 1.610d x 10-1
Make 1.610d x 10-1 to 101
-1 + x = 1 x = 2 move 2 to left 0.0161d x 101
9.9990 x 101
+ 0.0161d x 101
----------------------- 10.0151 x 101
Decimal Floating-Point Addition 86
• Step 3: Normalize the sum
• Step 4: Round the significand (to 4 decimal digits for significand)
•
• Step 5: Normalize it (if need be)
Example: 9.999d x 101 + 1.610d x 10-1
10.0151 x 101 1.00151 x 102
1.00151 x 102 1.0015 x 102
No need as its normalized
Decimal Floating-Point Addition 88
• Adjusts the numbers
• Step 1: Align the decimal point of the number that has the smaller exponent
0.5 0.10b x 20 1.0b x 2-1
Example: 0.5d + (-0.4375d )
-0.4375 -0.0111b x 20 -1.11b x 2-2
Make 1.11b x 2-2 to 2-1
-2 + x = -1 x = 1 move 1 to left - 0.111b x 2-1
Decimal Floating-Point Addition 89
• Step 2: add the significand
• Step 3: Normalize the sum
• Step 4: Round the significand (to 4 decimal digits for significand)
• Step 5: Normalize it (if need be)
1.000 x 2-1
+ -0.111 x 2-1
---------------------- 0.001 × 2-1
Example: 0.5d + (-0.4375d )
0.001 × 2-1 1.000 × 2-3
Fits in the 4 decimal digits
No need as its normalized
Floating-Point Multiplication 91
• Assume 4 decimal digits for significand and 2 decimal digits for exponent
• Step 1: Add the exponent of the 2 numbers
• Step 2: Multiply the significands
• Step 3: Normalize the product • check for overflow/underflow of the exponent after
normalisation
• Step 4: Round the significand • If the significand does not fit in the space reserved for it, it has to
be rounded off
• Step 5: Normalize it (if need be)
• Step 6: Set the sign of the product
Floating-Point Multiplication 92
• Assume 4 decimal digits for significand and 2 decimal digits for exponent
• Step 1: Add the exponent of the 2 numbers
• Step 2: Multiply the significands
Example: (1.110d x 1010 ) x (9.200d x 10-5)
10 + (-5) = 5 If biased is considered 10 + (-5) + 127 = 132
9.200 x 1.110 -------------- 92000 9200 9200 -------------- 10212000
10.212000 10.2120 x 105
Floating-Point Multiplication 93
• Step 3: Normalize the product
• Step 4: Round the significand (4 decimal digits for significand)
• Step 5: Normalize it (if need be)
• Step 6: Set the sign of the product
Example: (1.110d x 1010 ) x (9.200d x 10-5)
10.2120 x 105 1.02120 x 106
1.0212 x 106
Still normalized
+1.0212 x 106
Floating-Point Multiplication 95
• Assume 4 decimal digits for significand and 2 decimal digits for exponent
• Step 1: Add the exponent of the 2 numbers
• Step 2: Multiply the significands
-1 + (-2) = -3 If biased is considered -1 + (-2) + 127 = 124
1.110 x 1.000 -------------- 1110000 1.110000 1.110000 x 10-3
Example: (1.000b x 2-1 ) x (-1.110b x 2-2)
Floating-Point Multiplication 96
• Step 3: Normalize the product
• Step 4: Round the significand (4 decimal digits for significand)
• Step 5: Normalize it (if need be)
• Step 6: Set the sign of the product
Example: (1.000b x 2-1 ) x (-1.110b x 2-2
1.110000 x 10-3 already normalized
1.1100 x 10-3
Still normalized
-1.1100b x 10-3 -7/32 d
Floating-Point ALU
97
010 1 0 1
Control
Small ALU
Big ALU
Sign Exponent Significand Sign Exponent Significand
Exponent difference
Shift right
Shift left or right
Rounding hardware
Sign Exponent Significand
Increment or decrement
0 10 1
Shift smaller
number right
Compare
exponents
Add
Normalize
Round
Accurate Arithmetic 98
If a calculation exceeds the limits of the floating point scheme then CPU will flag this error.
If number is too tiny to be represented
Accurate Arithmetic : Truncation & Rounding
99
Some number have infinite decimal points (the irrational numbers) 1/3d = 0.3333333333
Truncation is done to fit the decimal points into manageable units. Truncation is where decimal values beyond the truncation point are
simply discarded and it can cause error in floating point calculations.
Rounding :If you have a number such as 3.456 then if you have to round this to 3 significant digits, the number becomes 3.46 A small error called the rounding error has occurred
***Note : the CPU will not flag any error when truncation and rounding occurs, as it is acting within its limits. programmers must assess if this will lead to significant errors