1 CSE 45432 SUNY New Paltz Chapter Four Arithmetic and Logic Unit 32 32 32 Operation Result a b ALU.
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of 1 CSE 45432 SUNY New Paltz Chapter Four Arithmetic and Logic Unit 32 32 32 Operation Result a b ALU.
1CSE 45432 SUNY New Paltz
Chapter FourArithmetic and Logic Unit
3232
3232
3232
OperationOperation
ResultResult
aa
bb
ALUALU
2CSE 45432 SUNY New Paltz
• Bits are just bits (no inherent meaning)— conventions define relationship between bits and numbers
• Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...decimal: 0...2n-1
• How do we represent negative numbers?i.e., which bit patterns will represent which numbers?
• Sign Magnitude Two's Complement000 = +0 000 = +0001 = +1 001 = +1010 = +2 010 = +2011 = +3 011 = +3100 = -0 100 = -4101 = -1 101 = -3110 = -2 110 = -2111 = -3 111 = -1
• Which one is best? Why?
Numbers
3CSE 45432 SUNY New Paltz
• 32 bit signed numbers:
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten
...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten
...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
• Converting n bit numbers into numbers with more than n bits:
– MIPS 16 bit immediate gets converted to 32 bits for arithmetic
– copy the most significant bit (the sign bit) into the other bits
0010 -> 0000 0010 1010 -> 1111 1010
– "sign extension" (lbu vs. lb)
max
min
MIPS
4CSE 45432 SUNY New Paltz
Overflow
• Examples: 7 + 3 = 10 but ...
• - 4 - 5 = - 9 but ...
2’s ComplementBinaryDecimal
0 0000
1 0001
2 0010
3 0011
0000
1111
1110
1101
Decimal
0
-1
-2
-3
4 0100
5 0101
6 0110
7 0111
1100
1011
1010
1001
-4
-5
-6
-7
1000-8
0 1 1 1
0 0 1 1+
1 0 1 0
1
1 1 0 0
1 0 1 1+
0 1 1 1
110
7
3
1
– 6
– 4
– 5
7
0
5CSE 45432 SUNY New Paltz
• Overflow (result too large for finite computer word):
– e.g., adding two n-bit numbers does not yield an n-bit number
• Note that overflow term is somewhat misleading, it does not mean a carry “overflowed”Note that overflow term is somewhat misleading, it does not mean a carry “overflowed”• No overflow when adding a positive and a negative number• No overflow when signs are the same for subtraction• Overflow occurs when the value affects the sign:
– overflow when adding two positives yields a negative – or, adding two negatives gives a positive– or, subtract a negative from a positive and get a negative– or, subtract a positive from a negative and get a positive
• In MIPS add, addi, sub cause exception (interrupt) on overflow– Details based on software system
• Don't always want to detect overflow– new MIPS instructions: addu, addiu, subu
Detecting Overflow
6CSE 45432 SUNY New Paltz
Building a 32 bit ALU
Let's look at a 1-bit ALU for addition:
How could we build a 1-bit ALU for add, and, and or?
a
b
Sum = a b cin Cout = a b + (a b) cin
Cout = a b + a cin + bcin
Result31a31
b31
Result0
CarryIn
a0
b0
Result1a1
b1
Result2a2
b2
Operation
ALU0
CarryIn
CarryOut
ALU1
CarryIn
CarryOut
ALU2
CarryIn
CarryOut
ALU31
CarryIn
+
Carry In
CarryOut
Sum
b
0
2
a
1
+
Result
OperationCarry In
CarryOut
7CSE 45432 SUNY New Paltz
• Two's complement approach:
• a - b = a + b + 1
What about subtraction (a – b) ?
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b
8CSE 45432 SUNY New Paltz
Overflow Detection Logic
• Carry into MSB XOR Carry out of MSB
– For a N-bit ALU: Overflow = CarryIn[N - 1] CarryOut[N - 1]
a0
b0
1-bitALU
Result0
CarryIn0
CarryOut0
a1
b1
1-bitALU
Result1
CarryIn1
CarryOut1
a2
b2
1-bitALU
Result2
CarryIn2
a3
b3
1-bitALU
Result3
CarryIn3
CarryOut3
Overflow
X Y X XOR Y
0 0 0
0 1 1
1 0 1
1 1 0
Supporting slt
• Need to support the set-on-less-than instruction (slt)
– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b and use sign bit
Overflow
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
LSB.
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Sign
Overflowdetection
MSB
10CSE 45432 SUNY New Paltz
Seta31
0
ALU0 Result0
CarryIn
a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Binvert
CarryIn
Less
CarryIn
CarryOu t
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn (sign)
11CSE 45432 SUNY New Paltz
Test for equality
• Notice control lines:
000 = and001 = or010 = add110 = subtract111 = slt
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
Ca rryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
12CSE 45432 SUNY New Paltz
Conclusion
• We can build an ALU to support the MIPS instruction set
– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware
– the speed of a gate is affected by the number of inputs to the gate
– the speed of a circuit is affected by the number of gates in series(on the “critical path” or the “deepest level of logic”)
• Our primary focus: comprehension, however,– Clever changes to organization can improve performance
(similar to using better algorithms in software)– we’ll look at two examples for addition and multiplication
13CSE 45432 SUNY New Paltz
• Is a 32-bit ALU as fast as a 1-bit ALU?
• Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1 = b0c0 + a0c0 + a0b0
c2 = b1c1 + a1c1 + a1b1 c2 =
c3 = b2c2 + a2c2 + a2b2 c3 =
c4 = b3c3 + a3c3 + a3b3 c4 =
Not feasible! Why?
Problem: ripple carry adder is slow
14CSE 45432 SUNY New Paltz
• An approach in-between our two extremes
• c1 = b0c0 + a0c0 + a0b0 = (b0 + a0)c0 + a0b0
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry? gi = ai bi
– When would we propagate the carry? pi = ai + bi
• Did we get rid of the ripple?
c1 = g0 + p0c0
c2 = g1 + p1c1 c2 =
c3 = g2 + p2c2 c3 =
c4 = g3 + p3c3 c4 =
Carry-lookahead adder
15CSE 45432 SUNY New Paltz
Carry Look Ahead (Design trick: peek)
a0
b0
S0
gp
g = a bp = a + b
a1
b1
S1gp
a2
b2
S2
gp
a3
b3
S3
gp
cin
c1= g0 +p0 c0
c2 = g1 + p1 g0 + p1p0c0
c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1p0 c0
G
C4 = . . .
P
16CSE 45432 SUNY New Paltz
Plumbing as Carry Lookahead Analogy
p0
c0g0
c1
p0
c0g0
p1g1
c2
p0
c0g0
p1g1
p2g2
p3g3
c4
17CSE 45432 SUNY New Paltz
To build bigger adders
CarryIn
Result0--3
ALU0
CarryIn
Result4--7
ALU1
CarryIn
Result8--11
ALU2
CarryIn
CarryOut
Result12--15
ALU3
CarryIn
C1
C2
C3
C4
P0G0
P1G1
P2G2
P3G3
pigi
pi + 1gi + 1
ci + 1
ci + 2
ci + 3
ci + 4
pi + 2gi + 2
pi + 3gi + 3
a0b0a1b1a2b2a3b3
a4b4a5b5a6b6a7b7
a8b8a9b9
a10b10a11b11
a12b12a13b13a14b14a15b15
• Can’t build a 16 bit adder this way .. (too big)
• Could use ripple carry of 4-bit CLA adders
• Better: use the CLA principle again
18CSE 45432 SUNY New Paltz
Cascaded Carry Look-ahead (16-bit)
CLA
4-bitAdder
4-bitAdder
4-bitAdder
C1 = G0 + P0 C0
C2 = G1 + P1 G0 + P1 P0 C0
C3 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 GP
G0P0
C4 = . . .
C0
19CSE 45432 SUNY New Paltz
2nd level Carry, Propagate as Plumbing
p0g0
p1g1
p2g2
p3g3
G0
p1
p2
p3
P0
20CSE 45432 SUNY New Paltz
Carry Lookahead Example
Example: Determine the gi, pi, Pi, and Gi values of the following two 16 bit numbers. What is Cout15 (C16)?
a: 0001 1010 0011 0011
b: 1110 0101 1110 1011
pi = ai + bi
gi = ai bi
ci
– Repeat Using Pi and Gi
P0= P1= P2= P3=
G0 =
G1 =
G2 =
G3 =
C4 =
21CSE 45432 SUNY New Paltz
Speed of Ripple Carry Versus Carry Lookahead
• One simple way to model time for logic is to assume each AND and OR gate takes the same time for a signal to pass through it. Time is estimated by simply counting the number of gates along the longest path through a piece of logic.Compare the number of gate delays for the critical paths of two 16-bit adders, one using ripple carry and one using two-level carry lookahead.
22CSE 45432 SUNY New Paltz
Other Design Tricks: Guess
n-bit adder n-bit addern-bit adder 1 0
n-bit adder n-bit adder
Cout Carry-select adderCarry-select adder
23CSE 45432 SUNY New Paltz
• Let's look at 3 versions based on grade school algorithm
0010 (multiplicand)__x_1011 (multiplier)
0010 0010 0000 0010 0010110• Negative numbers: convert and multiply
– there are better techniques (i.e. Booth Algorithm), we won’t look at them• m bits x n bits = m+n bit product• Binary makes it easy:
– 0 => place 0 ( 0 x multiplicand)– 1 => place a copy ( 1 x multiplicand)
Multiplication
24CSE 45432 SUNY New Paltz
Multiplication (version 1)
• 64-bit Multiplicand register, 64-bit ALU, 64-bit Product register, 32-bit multiplier register
Product
Multiplier
Multiplicand
64-bit ALU
Shift Left
Shift Right
WriteControl
32 bits
64 bits
64 bits
Multiplier = datapath + controlMultiplier = datapath + control
25CSE 45432 SUNY New Paltz
Multiplication Algorithm Version 1
• Product Multiplier Multiplicand 0000 0000 0011 0000 0010
• 0000 0010 0001 0000 0100
• 0000 0110 0000 0000 1000
• 0000 01103. Shift the Multiplier register right 1 bit
DoneYes: 32 repetitions
2. Shift the Multiplicand register left 1 bit
No: < 32 repetitions
1. TestMultiplier0
Multiplier0 = 0Multiplier0 = 1
1a. Add multiplicand to product & place the result in Product register
32nd repetition?
Start
26CSE 45432 SUNY New Paltz
Observations on Multiplication: Version 1
Done
1. TestMultiplier0
1a. Add multiplicand to product andplace the result in Product register
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
64-bit ALU
Control test
MultiplierShift right
ProductWrite
MultiplicandShift left
64 bits
64 bits
32 bits
1 clock per cycle => 100 clocks per multiplyRatio of multiply to add 5:1 to 100:1
1/2 bits in multiplicand always 0=> 64-bit adder is wasted0’s inserted in left of multiplicand as shifted=> least significant bits of product never changed once formedInstead of shifting multiplicand to left, shift product to right?
27CSE 45432 SUNY New Paltz
Multiplication Version 2
32-bit Multiplicand register, 32-bit ALU, 64-bit Product register, 32-bit Multiplier register
Product
Multiplier
Multiplicand
32-bit ALU
Shift Right
WriteControl
32 bits
32 bits
64 bits
Shift Right
28CSE 45432 SUNY New Paltz
Multiplication Algorithm Version 2
Multiplier Multiplicand Product0011 0010 0000 0000
3. Shift the Multiplier register right 1 bit.
DoneYes: 32 repetitions
2. Shift the Product register right 1 bit.
No: < 32 repetitions
1. TestMultiplier0
Multiplier0 = 0Multiplier0 = 1
1a. Add multiplicand to the left half of product & place the result in the left half of Product register
32nd repetition?
Start
Product Multiplier Multiplicand 0000 0000 0011 0010
30CSE 45432 SUNY New Paltz
Observations on Multiplication Version 2
MultiplierShift right
Write
32 bits
64 bits
32 bits
Shift right
Multiplicand
32-bit ALU
Product Control test
Done
1. TestMultiplier0
1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register
2. Shift the Product register right 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Product register wastes space that exactly matches size of multiplierCombine Multiplier register and Product register
31CSE 45432 SUNY New Paltz
Multiplication Version 3
• 32-bit Multiplicand register, 32 -bit ALU, 64-bit Product register, (0-bit Multiplier register)
Product (Multiplier)
Multiplicand
32-bit ALU
WriteControl
32 bits
64 bits
Shift Right
32CSE 45432 SUNY New Paltz
Multiplication Algorithm Version 3
Multiplicand Product0010 0000 0011
DoneYes: 32 repetitions
2. Shift the Product register right 1 bit.
No: < 32 repetitions
1. TestProduct0
Product0 = 0Product0 = 1
1a. Add multiplicand to the left half of product & place the result in the left half of Product register
32nd repetition?
Start
33CSE 45432 SUNY New Paltz
Observations on Final Version
ControltestWrite
32 bits
64 bits
Shift rightProduct
Multiplicand
32-bit ALU
Done
1. TestProduct0
1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register
2. Shift the Product register right 1 bit
32nd repetition?
Start
Product0 = 0Product0 = 1
No: < 32 repetitions
Yes: 32 repetitions
• 2 steps per bit because Multiplier & Product combined• How can you make it faster?• What about signed multiplication?
– Booth’s Algorithm
34CSE 45432 SUNY New Paltz
Unsigned Combinational Multiplier
• Stage i accumulates A * 2 i if Bi == 1
• Q: How much hardware for 32 bit multiplier? Critical path?
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 0
35CSE 45432 SUNY New Paltz
Floating Point (a brief look)
• We need a way to represent
– numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576 109
• Representation:
– sign, exponent, significand: (–1)sign significand 2exponent
– more bits for significand gives more accuracy
– more bits for exponent increases range
• IEEE 754 floating point standard:
– single precision: sign bit 8 bit exponent 23 bit significand
– double precision: sign bit 11 bit exponent 52 bit significand
36CSE 45432 SUNY New Paltz
IEEE 754 floating-point standard
• Leading “1” bit of significand is implicit
• Exponent is “biased” to make sorting easier
– All 0s is smallest exponent all 1s is largest
– Bias of 127 for single precision and 1023 for double precision
– Summary: (–1)sign significand) 2 exponent – bias
• Example:
– Decimal: -.75 = -3/4 = -3/22
– Binary: -.11 = -1.1 x 2-1
– Floating point: exponent = 126 = 01111110
– Single precision: sign bit 8 bit exponent 23 bit significand
– IEEE single precision: 1 01111110 10000000000000000000000
37CSE 45432 SUNY New Paltz
Floating-Point Arithmetic
• Addition example: three digit significand
– 9.999 101 + 1.610 10-1.
– Step 1: Align ==> 9.999 101 + 0. 01610 101
– Step 2: Add Significand ==> 9.999
0.016
10.015
– Step 3: Normalize: ==> 1.0015 102
– Step 4: Round: ==> 1.002 102
• Multiplication example: 1.110 1010 + 9.200 10-5
– Step 1: Add exponents (10 + 127) + (-5 + 127) - 127 = (5 + 127)
– Step 2: Multiply significand 1.110 9.200 = 10.212000
– Step 3: Normalize 10.212000 105 = 1.021 106
– Step 4: Sign of product + 1.021 106
38CSE 45432 SUNY New Paltz
Floating Point Complexities
• Operations are somewhat more complicated
• In addition to overflow we can have “underflow”
• Accuracy can be a big problem
– IEEE 754 keeps two extra bits, guard and round
– four rounding modes: round up, round down, truncate, nearest even
– positive divided by zero yields “infinity” (see page 300)
– zero divide by zero yields “not a number” (see page 300)
– other complexities
39CSE 45432 SUNY New Paltz
Chapter Four Summary
• Computer arithmetic is constrained by limited precision
• Bit patterns have no inherent meaning but standards do exist
– two’s complement
– IEEE 754 floating point
• Computer instructions determine “meaning” of the bit patterns
• Performance and accuracy are important so there are manycomplexities in real machines (i.e., algorithms and implementation).
40CSE 45432 SUNY New Paltz
Barrel Shifter
Technology-dependent solutions: transistor per switch
D3
D2
D1
D0
A6
A5
A4
A3 A2 A1 A0
SR0SR1SR2SR3
41CSE 45432 SUNY New Paltz
Motivation for Booth’s Algorithm• Example 2 x 6 = 0010 x 0110:
0010 x 0110 + 0000 shift (0 in multiplier)
+ 0010 add (1 in multiplier)
+ 0100 add (1 in multiplier)
+ 0000 shift (0 in multiplier)
00001100
• ALU with add or subtract gets same result in more than one way:
• 6= – 2 + 8 0110 = – 00010 + 01000 = 11110 + 01000
• For example
• 0010
• x 0110 0000 shift (0 in multiplier)
• – 0010 sub (first 1 in multpl.)
• 0000 shift (mid string of 1s)
• + 0010 add (prior step had last 1) 00001100
42CSE 45432 SUNY New Paltz
Booth’s Algorithm
Current Bit Bit to the Right Explanation Example Op
1 0 Begins run of 1s 0001111000 sub
1 1 Middle of run of 1s 0001111000 none
0 1 End of run of 1s 0001111000 add
0 0 Middle of run of 0s 0001111000 none
Originally for Speed (when shift was faster than add)
• Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one
0 1 1 1 1 0beginning of runend of run
middle of run
–1+ 10000
01111
43CSE 45432 SUNY New Paltz
Booths Example (2 x 7)
1a. P = P - m 1110 + 11101110 0111 0 shift P (sign ext)
1b. 0010 1111 0011 1 11 -> nop, shift
2. 0010 1111 1001 1 11 -> nop, shift
3. 0010 1111 1100 1 01 -> add
4a. 0010 + 0010 0001 1100 1 shift
4b. 0010 0000 1110 0 done
Operation Multiplicand Product next?
0. initial value 0010 0000 0111 0 10 -> sub
44CSE 45432 SUNY New Paltz
Booths Example (2 x -3)
1a. P = P - m 1110 + 11101110 1101 0 shift P (sign
ext)
1b. 0010 1111 0110 1 01 -> add + 0010
2a. 0001 0110 1 shift P
2b. 0010 0000 1011 0 10 -> sub +1110
3a. 0010 1110 1011 0 shift
3b. 0010 1111 0101 1 11 -> nop4a 1111 0101 1 shift
4b. 0010 1111 1010 1 done
Operation Multiplicand Product next?
0. initial value 0010 0000 1101 0 10 -> sub
45CSE 45432 SUNY New Paltz
How does it work?
• at each stage shift A left ( x 2)
• use next bit of B to determine whether to add in shifted multiplicand
• accumulate 2n bit partial product at each stage
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 00 0 0