lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page...
Transcript of lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page...
![Page 1: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/1.jpg)
EE141
EECS151/251ASpring2020 DigitalDesignandIntegratedCircuitsInstructor:JohnWawrzynek
Lecture 18:Multiplier Circuits, Counters, Shifters
![Page 2: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/2.jpg)
Page
Warmup• Recall long multiplication of base-10 by hand:
• In base-2 (binary), we do the same thing:
!2
12 56x
011 101x
![Page 3: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/3.jpg)
Page
Multiplication a3 a2 a1 a0 Multiplicand b3 b2 b1 b0 Multiplier
X a3b0 a2b0 a1b0 a0b0
a3b1 a2b1 a1b1 a0b1 Partial
a3b2 a2b2 a1b2 a0b2 products a3b3 a2b3 a1b3 a0b3
. . . a1b0+a0b1 a0b0 Product
Many different circuits exist for multiplication. Each one has a different balance between speed (performance) and amount of logic (cost).
!3
![Page 4: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/4.jpg)
Control Algorithm: 1. P ← 0, A ← multiplicand, B ← multiplier 2. If LSB of B==1 then add A to P else add 0 3. Shift [P][B] right 1 4. Repeat steps 2 and 3 n-1 more
times. 5. [P][B] has product.
Page
“Shift and Add” Multiplier• Sums each partial
product, one at a time. • In binary, each partial
product is shifted versions of A or 0.
• Cost α n, Τ = n clock cycles. • What is the critical path for
determining the min clock period?
!4
![Page 5: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/5.jpg)
Page
“Shift and Add” MultiplierSigned Multiplication: Remember for 2’s complement numbers MSB has negative weight:
ex: -6 = 110102 = 0•20 + 1•21 + 0•22 + 1•23 - 1•24
= 0 + 2 + 0 + 8 - 16 = -6
• Therefore for multiplication: a) subtract final partial product b) sign-extend partial products • Modifications to shift & add circuit: a) adder/subtractor b) sign-extender on P shifter register
!5
![Page 6: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/6.jpg)
Page
Convince yourself• What’s -3 x 5?
!6
1101 0101x
![Page 7: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/7.jpg)
EE141
Outline for Multipliers❑ Combinational multiplier ❑ Latency & Throughput ▪ Wallace Tree ▪ Pipelining to increase
throughput ❑ Smaller multipliers ▪ Booth encoding ▪ Serial, bit-serial
❑ Two’s complement multiplier
7
![Page 8: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/8.jpg)
EE141
Unsigned Combinational
Multiplier
![Page 9: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/9.jpg)
Page
Array Multiplier
Each row: n-bit adder with AND gates
What is the critical path?
Single cycle multiply: Generates all n partial products simultaneously.
!9
![Page 10: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/10.jpg)
Page
Carry-Save Addition• Speeding up multiplication is a
matter of speeding up the summing of the partial products.
• “Carry-save” addition can help. • Carry-save addition passes
(saves) the carries to the output, rather than propagating them.
• Example: sum three numbers, 310 = 0011, 210 = 0010, 310 = 0011
310 0011 + 210 0010 c 0100 = 410 s 0001 = 110
310 0011 c 0010 = 210
s 0110 = 610
1000 = 810
carry-save add
carry-save add
carry-propagate add
• In general, carry-save addition takes in 3 numbers and produces 2. • Sometimes called a “3:2 compressor”: 3 input signals into 2 in a potentially lossy
operation • Whereas, carry-propagate takes 2 and produces 1. • With this technique, we can avoid carry propagation until final addition
!10
![Page 11: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/11.jpg)
Page
Carry-save Circuits
• When adding sets of numbers, carry-save can be used on all but the final sum.
• Standard adder (carry propagate) is used for final sum.
• Carry-save is fast (no carry propagation) and cheap (same cost as ripple adder)
!11
![Page 12: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/12.jpg)
Page
Array Multiplier using Carry-save Addition
Fast carry-propagate adder
!12
![Page 13: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/13.jpg)
Page
Carry-save AdditionCSA is associative and commutative. For example: (((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))
• A balanced tree can be used to reduce the logic delay.
• It doesn’t matter where you add the carries and sums, as long as you eventually do add them.
• This structure is the basis of the Wallace Tree Multiplier.
• Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
• Multiplier delay α log3/2N + log2N
!13
![Page 14: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/14.jpg)
EE141
Increasing Throughput: Pipelining
= register
Idea: split processing across several clock cycles by dividing circuit into pipeline stages separated by registers that hold values passing from one stage to the next.
14
![Page 15: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/15.jpg)
EE141
Smaller Combinational Multipliers
![Page 16: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/16.jpg)
EE141
Booth Recoding: Higher-radix mult.
AN-1 AN-2 … A4 A3 A2 A1 A0 BM-1 BM-2 … B3 B2 B1 B0x
...
2M/2
BK+1,K*A = 0*A → 0 = 1*A → A = 2*A → 4A – 2A = 3*A → 4A – A
Idea: If we could use, say, 2 bits of the multiplier in generating each partial product we would halve the number of columns and halve the latency of the multiplier!
Booth’s insight: rewrite 2*A and 3*A cases, leave 4A for next partial product to do! 16
![Page 17: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/17.jpg)
EE141
Booth recoding
BK+1 0 0 0 0 1 1 1 1
BK 0 0 1 1 0 0 1 1
BK-1 0 1 0 1 0 1 0 1
action
add 0add A add A
add 2*A sub 2*A sub A sub A add 0
A “1” in this bit means the previous stage needed to add 4*A. Since this stage is shifted by 2 bits with respect to the previous stage, adding 4*A in the previous stage is like adding A in this stage!
-2*A+A
-A+A
from previous bit paircurrent bit pair
BK+1,K*A = 0*A → 0 = 1*A → A = 2*A → 4A – 2A = 3*A → 4A – A
17
![Page 18: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/18.jpg)
EE141
Example
18
![Page 19: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/19.jpg)
Page
Bit-serial Multiplier• Bit-serial multiplier (n2 cycles, one bit of result per n cycles):
• Control Algorithm:
repeat n cycles { // outer (i) loop repeat n cycles{ // inner (j) loop shiftA, selectSum, shiftHI } shiftB, shiftHI, shiftLOW, reset }
Note: The occurrence of a control signal x means x=1. The absence of x means x=0.
!19
![Page 20: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/20.jpg)
EE141
Signed Multipliers
![Page 21: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/21.jpg)
EE141
Combinational Multiplier (signed!)
X * Y = (-3) * (-2)
(-3) 1 0 1 (X) (-2) * 1 1 0 (Y) -------------------- 0 0 0 0 0 0 Y0*X = 0 + 1 1 1 0 1 2Y1*X = -6 - 1 1 0 1 4Y2*X = -12 ---------------------- (+6) 0 0 0 1 1 0
21
![Page 22: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/22.jpg)
EE141
Combinational Multiplier (signed) X3 X2 X1 X0 * Y3 Y2 Y1 Y0 -------------------- X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 - X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 ----------------------------------------- Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
x3
FA
x2
FA
x1
FA
x2
FA
x1
FA
x0
FA
x1
HA
x0
HA
x0
FA
x3
FA
x2
FA
x3
x3 x2 x1 x0
z0
z1
z2
z3z4z5z6z7
y3
y2
y1
y0
FAFAFA
FA
FA
FA
FA
1There are tricks we can use to eliminate the extra circuitry we added… 22
![Page 23: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/23.jpg)
EE141
2’s Complement Multiplication(Baugh-Wooley)
X3 X2 X1 X0 * Y3 Y2 Y1 Y0 -------------------- X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 - X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 ----------------------------------------- Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X2Y1 X1Y1 X0Y1 + X2Y2 X1Y2 X0Y2 + X3Y3 X2Y3 X1Y3 X0Y3 + 1 1
Step 1: two’s complement operands so high order bit is –2N-1. Must sign extend partial products and subtract the last one
Step 2: don’t want all those extra additions, so add a carefully chosen constant, remembering to subtract it at the end. Convert subtraction into add of (complement + 1).
Step 3: add the ones to the partial products and propagate the carries. All the sign extension bits go away!
Step 4: finish computing the constants…
Result: multiplying 2’s complement operands takes just about same amount of hardware as multiplying unsigned operands!
X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X2Y1 X1Y1 X0Y1 + X2Y2 X1Y2 X0Y2 + X3Y3 X2Y3 X1Y3 X0Y3 + + 1 - 1 1 1 1
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + 1 + X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + 1 + X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 + 1 + X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 + 1 + 1 - 1 1 1 1
–B = ~B + 1
23
![Page 24: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/24.jpg)
EE141
2’s Complement Multiplication
FA
x3
FA
x2
FA
x1
FA
x2
FA
x1
HA
x0
FA
x1
HA
x0
HA
x0
FA
x3
FA
x2
FA
x3
HA
1
1
x3 x2 x1 x0
z0
z1
z2
z3z4z5z6z7
y3
y2
y1
y0
24
![Page 25: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/25.jpg)
Page
Example• What’s -3 x -5?
!25
1101 1011x
![Page 26: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/26.jpg)
EE141
Multiplication in VerilogYou can use the “*” operator to multiply two numbers:
wire [9:0] a,b; wire [19:0] result = a*b; // unsigned multiplication!
If you want Verilog to treat your operands as signed two’s complement numbers, add the keyword signed to your wire or reg declaration:
wire signed [9:0] a,b; wire signed [19:0] result = a*b; // signed multiplication!
Remember: unlike addition and subtraction, you need different circuitry if your multiplication operands are signed vs. unsigned. Same is true of the >>> (arithmetic right shift) operator. To get signed operations all operands must be signed.
To make a signed constant: 10’sh37C
wire signed [9:0] a; wire [9:0] b; wire signed [19:0] result = a*$signed(b);
26
![Page 27: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/27.jpg)
EE141
Outline❑ Constant Coefficient
Multiplication ❑ Shifters ❑ Counters
27
![Page 28: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/28.jpg)
EE141
Constant Multiplication❑ Our multiplier circuits so far has assumed both the
multiplicand (A) and the multiplier (B) can vary at runtime. ❑ What if one of the two is a constant? Y = C * X ❑ “Constant Coefficient” multiplication comes up often in
signal processing and other hardware. Ex:
yi = αyi-1+ xi
where α is an application dependent constant that is hard-wired into the circuit.
❑ How do we build and array style (combinational) multiplier that takes advantage of the constancy of one of the operands?
xi yi
28
![Page 29: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/29.jpg)
EE141
Multiplication by a Constant❑ If the constant C in C*X is a power of 2, then the multiplication is simply
a shift of X. ❑ Ex: 4*X
❑ What about division?
❑ What about multiplication by non- powers of 2?
29
![Page 30: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/30.jpg)
EE141
Multiplication by a Constant❑ In general, a combination of fixed shifts and addition:
▪ Ex: 6*X = 0110 * X = (22 + 21)*X = 22 X + 21 X
▪ Details:
30
![Page 31: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/31.jpg)
EE141
Multiplication by a Constant❑ Another example: C = 2310 = 010111
❑ In general, the number of additions equals one less than the number of 1’s in the constant.
❑ Using carry-save adders (for all but one of these) helps reduce the delay and cost, and using balanced trees helps with delay, but the number of adders is still the number of 1’s in C minus 2.
❑ Is there a way to further reduce the number of adders (and thus the cost and delay)?
31
![Page 32: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/32.jpg)
EE141
Multiplication using Subtraction❑ Subtraction is approximately the same cost and delay as
addition. ❑ Consider C*X where C is the constant value 1510 = 01111. C*X requires 3 additions. ❑ We can “recode” 15 from 01111 = (23 + 22 + 21 + 20 ) to 10001 = (24 - 20 ) where 1 means negative weight. ❑ Therefore, 15*X can be implemented with only one subtractor.
32
<<4
![Page 33: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/33.jpg)
EE141
Canonic Signed Digit Representation❑ CSD represents numbers using 1, 1, & 0 with the least
possible number of non-zero digits. ▪ Strings of 2 or more non-zero digits are replaced. ▪ Leads to a unique representation.
❑ To form CSD representation might take 2 passes: ▪ First pass: replace all occurrences of 2 or more 1’s: 01..10 by 10..10 ▪ Second pass: same as above, plus replace 0110 by 0010
and 0110 by 0010 ❑ Examples:
❑ Can we further simplify the multiplier circuits?
0010111 = 23 0011001 0101001 = 32 - 8 - 1011101 = 29
100101 = 32 - 4 + 1
0110110 = 54 1011010 1001010 = 64 - 8 - 2
33
![Page 34: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/34.jpg)
EE141
“Constant Coefficient Multiplication” (KCM)Binary multiplier: Y = 231*X = (27 + 26 + 25 + 22 + 21+20)*X
❑ CSD helps, but the multipliers are limited to shifts followed by adds. ▪ CSD multiplier: Y = 231*X = (28 - 25 + 23 - 20)*X
❑ How about shift/add/shift/add …? ▪ KCM multiplier: Y = 231*X = 7*33*X = (23 - 20)*(25 + 20)*X
❑ No simple algorithm exists to determine the optimal KCM representation. ❑ Most use exhaustive search method.
34
![Page 35: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/35.jpg)
EE141
Shifters
![Page 36: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/36.jpg)
EE141
Fixed Shifters / Rotators Defined
Logical Shift
Rotate
Arithmetic Shift
36
![Page 37: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/37.jpg)
EE141
Variable Shifters / Rotators• Example: X >> S, where S is unknown when we synthesize the circuit. • Uses: shift instruction in processors (ARM includes a shift on every
instruction), floating-point arithmetic, division/multiplication by powers of 2, etc.
• One way to build this is a simple shift-register: a) Load word, b) shift enable for S cycles, c) read word.
– Worst case delay O(N) , not good for processor design. – Can we do it in O(logN) time and fit it in one cycle?
37
![Page 38: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/38.jpg)
EE141
Log Shifter / Rotator❑ Log(N) stages, each shifts (or not) by a power of 2 places, S=[s2;s1;s0]:
Shift by N/2
Shift by 2
Shift by 1
![Page 39: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/39.jpg)
EE141
LUT Mapping of Log shifter
Efficient with 2to1 multiplexors, for instance, 3LUTs.
Virtex6 has 6LUTs. Naturally makes 4to1 muxes:
Reorganize shifter to use 4to1 muxes.
Final stage uses F7 mux
![Page 40: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/40.jpg)
EE141
“Improved” Shifter / Rotator❑ How about this approach? Could it lead to even less delay?
❑ What is the delay of these big muxes? ❑ Look a transistor-level implementation?
40
x1 x2 x3 x4 x5 x6 x7 x0
Left-shift with rotate
![Page 41: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/41.jpg)
EE141
Barrel Shifter❑ Cost/delay?
▪ (don’t forget the decoder)
41
![Page 42: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/42.jpg)
EE141
Connection Matrix
❑ Generally useful structure: ▪ N2 control points. ▪ What other interesting
functions can it do?
![Page 43: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/43.jpg)
EE141
Cross-bar Switch❑ Nlog(N) control
signals. ❑ Supports all
interesting permutations ▪ All one-to-one and
one-to-many connections.
❑ Commonly used in communication hardware (switches, routers).
43
![Page 44: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/44.jpg)
EE141
Counters
![Page 45: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/45.jpg)
EE141
Counters❑ Special sequential circuits (FSMs) that repeatedly
sequence through a set of outputs. ❑ Examples:
▪ binary counter: 000, 001, 010, 011, 100, 101, 110, 111, 000,
▪ gray code counter: 000, 010, 110, 100, 101, 111, 011, 001, 000, 010, 110, … ▪ one-hot counter: 0001, 0010, 0100, 1000, 0001, 0010, … ▪ BCD counter: 0000, 0001, 0010, …, 1001, 0000, 0001 ▪ pseudo-random sequence generators: 10, 01, 00, 11, 10,
01, 00, ... ❑ Moore machines with “ring” structure in State
Transition Diagram: S3
S0
S2
S1
![Page 46: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/46.jpg)
EE141
What are they used?❑ Counters are commonly used in hardware designs because
most (if not all) computations that we put into hardware include iteration (looping). Examples: ▪ Shift-and-add multiplication scheme. ▪ Bit serial communication circuits (must count one “words worth” of
serial bits. ❑ Other uses for counter:
▪ Clock divider circuits
▪ Systematic inspection of data-structures – Example: Network packet parser/filter control.
❑ Counters simplify “controller” design by: ▪ providing a specific number of cycles of action, ▪ sometimes used with a decoder to generate a sequence of timed
control signals. ▪ Consider using a counter when FSM has many states with few
branches.
1/416MHz 16MHz
![Page 47: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/47.jpg)
EE141
Controller using Counters❑ Example, Bit-serial multiplier (n2 cycles, one bit of result per n cycles):
❑ Control Algorithm:repeat n cycles { // outer (i) loop repeat n cycles{ // inner (j) loop shiftA, selectSum, shiftHI } shiftB, shiftHI, shiftLOW, reset }
Note: The occurrence of a control signal x means x=1. The absence of x means x=0.
![Page 48: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/48.jpg)
EE141
Controller using Counters• State Transition Diagram:
▪ Assume presence of two binary counters. An “i” counter for the outer loop and “j” counter for inner loop.
TC is asserted when the counter reaches it maximum count value. CE is “count enable”. The counter increments its value on the rising edge of the clock if CE is asserted.
![Page 49: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/49.jpg)
EE141
Controller using Counters• Controller circuit
implementation:• Outputs: CEi = q2
CEj = q1
RSTi = q0
RSTj = q2
shiftA = q1
shiftB = q2
shiftLOW = q2
shiftHI = q1 + q2
reset = q2
selectSUM = q1
![Page 50: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/50.jpg)
EE141
How do we design counters?❑ For binary counters (most common case) incrementer circuit
would work:
❑ In Verilog, a counter is specified as: x = x+1; ▪ This does not imply an adder ▪ An incrementer is simpler than an adder
❑ In general, the best way to understand counter design is to think of them as FSMs, and follow general procedure, however some special cases can be optimized.
register
+1
![Page 51: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/51.jpg)
EE141
Synchronous Counters
❑ Binary Counter Design: Start with 3-bit version and
generalize:
c b a c+ b+ a+
0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0
a+ = a’ b+ = a ⊕ b c+ = abc’ + a’b’c + ab’c + a’bc = a’c + abc’ + b’c = c(a’+b’) + c’(ab) = c(ab)’ + c’(ab) = c ⊕ ab
All outputs change with clock edge.
![Page 52: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/52.jpg)
EE141
Synchronous Counters❑ How do we extend to n-bits? ❑ Extrapolate c+: d+ = d ⊕ abc, e+ = e ⊕ abcd
❑ Has difficulty scaling (AND gate inputs grow with n)
❑ CE is “count enable”, allows external control of counting, ❑ TC is “terminal count”, is asserted on highest value, allows
cascading, external sensing of occurrence of max value.
TC
![Page 53: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/53.jpg)
EE141
Synchronous CountersTC
• How does this one scale? ☹ Delay grows α n
• Generation of TC signals similar to generation of carry signals in adder.
• “Parallel Prefix” circuit reduces delay:
log2n
log2n
Alternative Parallel Prefix Circuit
![Page 54: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/54.jpg)
EE141
Odd Counts❑ Extra combinational logic can be
added to terminate count before max value is reached:
❑ Example: count to 12
• Alternative loadable counter:
![Page 55: lec18-mult-count-shiftinst.eecs.berkeley.edu/~eecs151/sp20/files/lec18... · 2020. 4. 16. · Page Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3b 0](https://reader033.fdocuments.net/reader033/viewer/2022052809/6079c5accc210029af28a8ff/html5/thumbnails/55.jpg)
EE141
Ring Counters❑ “one-hot” counters 0001, 0010, 0100, 1000, 0001, …
“Self-starting” version: