Arithmetic For Computers
Mehran Rezaei
Introduction
• What happens if an operation generates
a number bigger than it can be
represented (by the space given
originally)?
• How does hardware multiply and divide
numbers?
• What about fractions, floating points and
real numbers? How does computer deal
with them?2
Addition and subtraction
3
+++
. . . .
A0B0A1B1A31B31
op
a/s
R0R1R31
00
z
Have you thought of performance?
4
ab
cin
cout
abcin s
Big-O notation
• f(n) is O(g(n)):
if (two constants) n0 and c can be found to satisfy:
f(n) < cg(n) for any n, n > n0
• g(n) is simple function: 1, n, log2n, n2, n3, 2n
• Following are O(n2):
5
Big-O notation (cont’d)
6
Have you thought of performance?
7
ab
cin
cout
abcin s
Tpd(result3) = 3*(Tpd(nand3)+Tpd(nand2))+Tpd(xor)
Tpd(resultn-1) = (n-1)*(Tpd(nand3)+Tpd(nand2))+Tpd(xor)
Tpd(resultn-1) = (n-1)*constant1+constant2 = O(n)
Ripple carry adder
• What seems to be the problem?
• N-bit (32-bit or 64-bit) ripple carry adder
8
Delay = O(n)
Area = O(n)
Carry select adder
9
Tpd(32 bit CSA) = Tpd(16 bit RCA) + Tpd(multiplexer)
Courtesy of slide: Chris Terman, computational structure, MIT
10
Courtesy of slide: Chris Terman, computational structure, MIT
Speedup: 2.5 times faster than 32 bit ripple carry adder
(in cost of: twice as much area)
Different flavors of adders
• Ripple Carry Adder موج لرزه های کوچک را ریپل می گویند
– RCA
• Carry Select Adder
– Delay: O(log2n)
– Area: O(n)
• Carry Lookahead Adder (CLA)
– Delay: O(log2n)
– Area: O(nlog2n)
• Carry skip adder
– Delay: O(n1/2)
– Area: O(n)
11
Carry Lookahead Adder
12
C1 = A0B0 + A0C0 + B0C0 = A0B0 + C0(A0+B0)
C2 = G1 + C1.P1
= G1 + (G0 + C0.P0).P1
C3 = G2 + C2.P2
= G2 + (G1 + G0.P1 + C0.P0.P1).P2
C4 = G3 + G2.P3 + G1.P2.P3 + G0.P1.P2.P3 + C0.P0.P1.P2.P3
Carry Lookahead Adder (cont’d)
13
C4 = G3 + G2.P3 + G1.P2.P3 + G0.P1.P2.P3 + C0.P0.P1.P2.P3
Delay: O(log2n)
Area: O(nlog2n)
Hybrid (CLA & CRA)
14
CLA
8 8
8
CLA
8 8
8
CLA
8 8
8
CLA
8 8
8
C0 C32C8 C16 C24
Group Generate/ Group Propagate
Final note on CLA
• Could I change Pi = Ai + Bi to Pi = Ai Bi?
15
If Cin = 0 then “carry is generated”
Else “carry is propagated”
End if
Ci+1 = Gi + Ci.Pi where Gi = Ai.Bi and Pi = Ai + Bi
Pi = Ai Bi =>
Ci+1= Gi + Ci•Pi
= Ai•Bi+ Ci•(Ai•Bi’+ Ai’•Bi)
= Ai•Bi + Ci•Ai•Bi’ + Ci•Ai’•Bi
= Ai•Bi + Ci•Ai•Bi’+ Ai•Bi + Ci•Ai’•Bi
= Ai•(Bi + Ci•Bi’) + Bi•(Ai + Ci•Ai’)
= Ai•(Bi+Bi’)•(Bi+Ci) + Bi•(Ai+Ai’)•(Ai+Ci)
= Ai•(Bi+Ci) + Bi•(Ai+Ci)
= Ai•Bi + Ci•(Ai+Bi)
Final note on CLA (cont’d)
16
FA
Ai
BiCi
Pi
SiGi
CoH=GH+CiHPH=GH+(GL+CiLPL)PH
= GH+GLPH + CiLPLPH
=GHL+CiLPHL
17
O(logN)
8 bit CLA (with Generate and Propagate)
8 bit CLA (with Generate and Propagate)
18
19
Addition/Subtraction and overflow
• Examples
0X4E + 0X1F
0X4E – 0X1F
• Overflow
operation result
A + B A > 0 B > 0 < 0
A + B A < 0 B < 0 > 0
A - B A > 0 B < 0 < 0
A - B A < 0 B > 0 > 0
condition
20
What does the ALU (hardware) do
when overflow happens?
• Ignore
– Programmer is responsible for
• Leave it to OS
– Either completely takes care of it
– Or signals the application
• What does MIPS do?
– For signed operation (if overflow occurs) it throws an
exception
– It ignores the overflow of unsigned operations
signed addition with status
• Adder with
– Carry-in: need an extra bit (LSB)
– Carry-out: need an extra bit (MSB)
– Overflow:
• two operands has the same sign but the sum has a different sign
– Zero
• 1 If result is zero and no overflow
• 0 otherwise
– Sign (of the addition result)
If not overflow, MSB result
Else (MSB result)’
21
Array Multiplier
22
𝑦 =
𝑖=0
𝑛−1
𝑎. 𝑏𝑖. 2𝑖
Array Multiplier
23
FAFAFA
Array Multiplier
24
FAFAFA
FAFAFA
Array Multiplier
25
FAFAFA
FAFAFA
FAFAFA
Array Multiplier
26
FAFAFA
FAFAFA
FAFAFA
FAFAFA
Add and Shift Multiplier
27
Add and Shift Multiplier (Cont’d)
28
1101
1011
000000000
1101
011010000
001101000
1101
100111000
010011100
0000
010011100
001001110
1101
100011110
010001111
Add and Shift Multiplier (Cont’d)
29
Add and Shift Multiplier (Cont’d)
30
Add and Shift Multiplier (Cont’d)
31
ProductCout
X
Y
for(i=0;i<4;i++){
if(Product[0] == 1)
(Cout,Product[7-4]) <-- Product[7-4] + Y;
ShiftRight (Cout,Product);
}
Data path and control unit
32
Data path
• Definition:
– A collection of computational components
(e.g., adders/subs, multipliers/dividers, FP
computational units, …) and memory
elements (e.g., flip flops, registers, shift
registers, …) connected with each other via
routing networks (buses) for performing all
the requirements needed and defined in
system’s specification
33
Control Unit
• Definition
– A combinational or sequential circuit that
controls the flow of data in the data path for
performing all the requirements needed and
defined in system’s specification
34
Synchronous vs Asynchronous circuits
• Globally synchronous circuit: all memory
elements (D FFs) controlled (synchronized) by
a common global clock signal
• Globally asynchronous but locally synchronous
circuit (GALS).
• Globally asynchronous circuit
– Use D FF but not with a global clock
– Use no clock signal
35
Synchronous Circuit
• The Big idea: Synchronous methodology
– Group all D FFs together with a single clock:
Synchronous methodology
– Only need to deal with the timing constraint
of one memory element
36
Routing Networks (Buses)
• Tri-state buffer:
– Output with “high-impedance”
37
Routing Networks (Bi – directional)
38
Routing Networks (Multiplexers)
39
What are the differences between tri-state
buffers and multiplexers?
Routing Networks (Cont’d)
40
S SS
D DD
S SS
D DD
Routing Networks (Multiplexers)
41
What are the differences between tri-state
buffers and multiplexers?
i1
i2
sel2
sel1
o
Side note wrap up
• Methodology
– Separate data path from control unit
• Defined data path and control unit
– Synchronous circuit design
• Routing networks and buses
– Tri-state buffers
– Multiplexers as routing networks
42
Add and Shift Multiplier (Cont’d)
43
ProductCout
X
Y
for(i=0;i<4;i++){
if(Product[0] == 1)
(Cout,Product[7-4]) <-- Product[7-4] + Y;
ShiftRight (Cout,Product);
}
A
A&S multiplier – data path
• Requirements
– 4 bit registers
• Shift register A (clear, load, and shift right)
• Shift register X (load, and shift right) - multiplier
• Register Y (load) - multiplicant
– A flip flop (load and clear)
– 4 bit adder
– A 2 to 1 multiplexer (if we have 4 bit wide
output bus)
44
A&S multiplier – data path (cont’d)
45
A X
Y
adder
Input Bus
Output Bus
Cout
result
carry out
A&S multiplier – data path (cont’d)
46
A X
Y
adder
Input Bus
Output Bus
Cout
result
carry out
sel
shift
load
cle
ar
sh
ift
loa
d
cle
ar
sh
ift
load
A&S multiplier – control unit
47
Examples
48
A catch
49
n n - 1
n = 0 n != 0
n n - 1
wait
n = 0 n != 0
A&S multiplier – control unit
start start - X[0] = 1
X[0] = 0
-
-
-
--
load X
clear Cout
clear A
load Y load A
load Cout
add
shift A
shift X
shift
done =1
sel = 0
done =1
sel = 1
entity
51
Add & Shift Multiplier
Product Multiplier
Multiplicand
32-bit ALU
Control
32 bits
64 bits
s
w
What happens on signed
Multiplications?
52
Signed Multiplications
• Consider again our 4 bit word multiplication X is
multiplicand and Y is multiplier
– If x3 = y3 = 0
• Unsigned multiplications
– If x3 = 0 and y3 = 1
• For the first 3 steps, do the normal add and shift; and finally
P = P – X
– If x3 = 1 and y3 = 0
• Do the normal shift until the first one is reached (in Y) from
this point on shift 1 instead of zero
– If x3 = 1 and y3 = 1
• Think about this as a homework problem
53
Signed Multiplication X Positive and Y negative
0110
1001
----
0110
00000000
--------
01100000
00110000
0110
0100
----
0000
00110000
--------
00110000
00011000
0110
0010
----
0000
00011000
--------
00011000
00001100
0110
0001
----
0110
00001100
--------
1010
00001100
--------
10101100
11010110
shiftMultiplicand
Multiplier
------------
Result
+ Product
------------
Product
Shift
Tw
o’s co
mp
limen
t
54
Booth’s algorithm (1951)
• In any of approaches we have seen, the
multiplier was examined bit by bit
• Can we take advantage of addition and
subtraction?
• In Booth’s algorithm every two bits of the
multiplier will indicate the action
55
Booth’s algorithm (Cont’d)
0 0 0 1 1 1 1 0Beginning
of run
End
of run
middle of run
–1
+ 10000
-------
011110 Current
PositionPrevious
Position
observation
56
Booth’s algorithm (example)
0110
1001
----
0000 10010
1010
---------
1010 1001
1101 01001
0110
---------
0011 0100
0001 10100
0000 1101
0000 11010
1010
---------
1010 1101
1101 0110
Prd
Sub
Prd
Shift
Add
Prd
shift
prd
shift
Sub
prd
Shift
57
Divide: Paper & Pencil
1000 1001010
-1000
-----
10
101
1010
-1000
-----
10
1001
Dividend : Divisor = Quotient
Divisor * Quotient + Remainder
= Dividend
58
Divide: Paper & Pencil (Cont’d)
• Initial values
• Rg = Rg – Div
• Rg < 0 -> Q0 = 0 and Rg = Rg + Div
• Rg >= 0 -> Q0 = 1
0010 0000 0000 0111
Divisor Rg
Remainder Rg
Quotient
-
+
0
1 0
0
+
1110 0000
7: 2 or 0000 0111 : 0010
+ 0010 0000
59
Divide: Paper & Pencil (Cont’d)
• Shift Div to Right
0001 0000 0000 0111
Divisor Rg
Remainder Rg
Quotient
-
+
0
1 0
0
0
60
Divide: Paper & Pencil (Cont’d)
• Check if the iteration N+1 reached
0001 0000 0000 0111
Divisor Rg
Remainder Rg
Quotient
-
+
0
1 0
0
0
Doneyes No
61
The rest of the example
Page 268, figure 4.38
62
V1 algorithm
2b. Restore the original value by adding the
Divisor register to the Remainder register, &
place the sum in the Remainder register. Also
shift the Quotient register to the left, setting
the new least significant bit to 0.
Test
Remainder
Remainder < 0Remainder 0
1. Subtract the Divisor register from the
Remainder register, and place the result
in the Remainder register.
2a. Shift the
Quotient register
to the left setting
the new rightmost
bit to 1.
3. Shift the Divisor register right 1 bit.
Done
Yes: n+1 repetitions (n = 4 here)
Start: Place Dividend in Remainder
n+1
repetition?
No: < n+1 repetitions
63
DIVIDE HARDWARE Version 1
Remainder
Quotient
Divisor
64-bit ALU
Shift Right
Shift Left
WriteControl
32 bits
64 bits
64 bits
64
Observations on Divide Version 1
• 1/2 bits in divisor always 0
=> 1/2 of 64-bit adder is wasted
=> 1/2 of divisor is wasted
– Cut the divisor and ALU to half
• Instead of shifting divisor to right,
shift remainder to left?
65
DIVIDE HARDWARE Version 2
Remainder
Quotient
Divisor
32-bit ALU
Shift Left
Write
Control
32 bits
32 bits
64 bits
Shift Left
66
Observations on Divide Version 2
• If the quotient receives a 1 at the first iteration then the
quotient register will not be long enough to hold the
value (one bit for any iteration!)
• Eliminate Quotient register by combining with
Remainder as shifted left
– Start by shifting the Remainder left as before.
– Thereafter loop contains only two steps because the shifting of
the Remainder register shifts both the remainder in the left half
and the quotient in the right half
– The consequence of combining the two registers together and
the new order of the operations in the loop is that the remainder
will shifted left one time too many.
– Thus the final correction step must shift back only the
remainder in the left half of the register
67
DIVIDE HARDWARE Version 3
Remainder (Quotient)
Divisor
32-bit ALU
Write
Control
32 bits
64 bits
Shift Left“HI” “LO”
68
V3 example
Page 271, figure 4.42
69
Observations on Divide Version 3
• Same Hardware as Multiply: just need ALU to add or subtract, and 63-bit register to shift left or shift right
• Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide
• Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary– Note: Dividend and Remainder must have same sign
– Note: Quotient negated if Divisor sign & Dividend sign disagreee.g., –7 ÷ 2 = –3, remainder = –1
• Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)
70
Multiplication and division Inst.
Page 274, figure 4.43
71
Review of Numbers
• Computers are made to deal with
numbers
• What can we represent in N bits?
– Unsigned integers:
0 to 2N - 1
– Signed Integers (Two’s Complement)
-2(N-1) to 2(N-1) - 1
72
Other Numbers
• What about other numbers?– Very large numbers? (seconds/century)
3,155,760,00010 (3.1557610 x 109)
– Very small numbers? (atomic diameter)0.0000000110 (1.010 x 10-8)
– Rational (repeating pattern) 2/3 (0.666666666. . .)
– Irrationals21/2 (1.414213562373. . .)
– Transcendental e (2.718...), (3.141...)
• All represented in scientific notation
73
Scientific Notation Review
• Normalized form: no leadings 0s (exactly one digit to left of decimal point)
• Alternatives to representing 1/1,000,000,000– Normalized: 1.0 x 10-9
– Not normalized: 0.1 x 10-8,10.0 x 10-10
6.02 x 1023
radix (base)decimal point
mantissa exponent
74
Scientific Notation for Binary Numbers
• Computer arithmetic that supports it called floating point, because it represents numbers where binary point is not fixed, as it is for integers– Declare such variable in C as float
1.0two x 2-1
radix (base)“binary point”
Mantissa exponent
75
Floating Point Representation (1/2)
• Normal format: +1.xxxxxxxxxxtwo*2yyyytwo
• Multiple of Word Size (32 bits)
031S Exponent30 23 22
Significand
1 bit 8 bits 23 bits
• S represents Sign
Exponent represents y’s
Significand represents x’s
• Represent numbers as small as
2.0 x 10-38 to as large as 2.0 x 1038
76
Floating Point Representation (2/2)
• What if result too large? (> 2.0x1038 )
– Overflow!
– Overflow => Exponent larger than represented in 8-bit
Exponent field
• What if result too small? (>0, < 2.0x10-38 )
– Underflow!
– Underflow => Negative exponent larger than
represented in 8-bit Exponent field
• How to reduce chances of overflow or
underflow?
77
Double Precision Fl. Pt. Representation
Next Multiple of Word Size (64 bits)
• Double Precision (vs. Single Precision)
– C variable declared as double
– Represent numbers almost as small as
2.0 x 10-308 to almost as large as 2.0 x 10308
– But primary advantage is greater accuracy
due to larger significand
031S Exponent
30 20 19Significand
1 bit 11 bits 20 bitsSignificand (cont’d)
32 bits
78
IEEE 754 Floating Point Standard
• Single Precision, DP similar
• Sign bit: 1 means negative
0 means positive
• Significand:
– To pack more bits, leading 1 implicit for normalized
numbers
– 1 + 23 bits single, 1 + 52 bits double
– always true: Significand < 1 (for normalized
numbers)
• Note: 0 has no leading 1, so reserve exponent
value 0 just for number 0
79
IEEE 754 Floating Point Standard
• Kahan wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares
• Could break FP number into 3 parts: compare signs, then compare exponents, then compare significands
• Wanted it to be faster, single compare if possible, especially if positive numbers
• Then want order:– Highest order bit is sign ( negative < positive)
– Exponent next, so big exponent => bigger #
– Significand last: exponents same => bigger #
80
IEEE 754 Floating Point Standard
• Called Biased Notation, where bias is number
subtract to get real number
–IEEE 754 uses bias of 127 for single prec.
–Subtract 127 from Exponent field to get actual value for
exponent
–1023 is bias for double precision
031S Exponent30 23 22
Significand
1 bit 8 bits 23 bits
• (-1)S x (1 + Significand) x 2(Exponent-127)
– Double precision identical, except with exponent
bias of 1023
81
“Father” of the Floating point standard
IEEE Standard 754
for Binary Floating-Point Arithmetic.
www.cs.berkeley.edu/~wkahan/
…/ieee754status/754story.html
Prof. Kahan
1989
ACM Turing
Award Winner
82
Converting Decimal to FP
• Simple Case: If denominator is an exponent of 2 (2, 4, 8, 16, etc.), then it’s easy.
• Show MIPS representation of -0.75
-0.75 = -3/4
-11two/100two = -0.11two
Normalized to -1.1two x 2-1
(-1)S x (1 + Significand) x 2(Exponent-127)
(-1)1 x (1 + .100 0000 ... 0000) x 2(126-127)
1 0111 1110 100 0000 0000 0000 0000 0000
83
Hairy Example
• How to represent 1/3 in MIPS?
• 1/3
= 0.33333…10
= 0.25 + 0.0625 + 0.015625 + 0.00390625 +
0.0009765625 + …
= 1/4 + 1/16 + 1/64 + 1/256 + 1/1024 + …
= 2-2 + 2-4 + 2-6 + 2-8 + 2-10 + …
= 0.0101010101… 2 * 20
= 1.0101010101… 2 * 2-2
84
Hairy Example
• Sign: 0
• Exponent = -2 + 127 = 12510=011111012
• Significand = 0101010101…
0 0111 1101 0101 0101 0101 0101 0101 010
85
Floating point instructions
Page 291, figure 4.47
Top Related