1 Integer Multipliers. 2 Multipliers A must have circuit in most DSP applications A variety of...
-
Upload
joaquin-sonn -
Category
Documents
-
view
215 -
download
1
Transcript of 1 Integer Multipliers. 2 Multipliers A must have circuit in most DSP applications A variety of...
2
Multipliers
• A must have circuit in most DSP applications• A variety of multipliers exists that can be chosen
based on their performance• Serial, Serial/Parallel,Shift and Add, Array, Booth,
Wallace Tree,….
XA
B
P
4
Multiplication Algorithm
Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … …. …. …. …. ….
Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2
Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1 -----------------------------------------------------------------------------------------------------------------------------------------
P2n-1 P2n-2 P2n-3 P2 P1 P0
X= Xn-1 Xn-2 …………………X0 Multiplicand
Y=Yn-1 Yn-2…………………….Y0 Multiplier
XA
B
P
5
A7 A6 A5 A4 A3 A2 A1 A0 B7 B6 B5 B4 B3 B2 B1 B0
A7.B2 A6.B2 A5.B2 A4.B2 A3.B2 A2.B2 A1.B2 A0.B2 A7.B3 A6.B3 A5.B3 A4.B3 A3.B3 A2.B3 A1.B3 A0.B3
A7.B4 A6.B4 A5.B4 A4.B4 A3.B4 A2.B4 A1.B4 A0.B4 A7.B5 A6.B5 A5.B5 A4.B5 A3.B5 A2.B5 A1.B5 A0.B5
1. Multiplication AlgorithmsImplementation of multiplication of binary numbers boils down to how to do the the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64
partial Products and then add them up.
A7.B0 A6.B0 A5.B0 A4.B0 A3.B0 A2.B0 A1.B0 A0.B0 A7.B1 A6.B1 A5.B1 A4.B1 A3.B1 A2.B1 A1.B1 A0.B1
. A7.B6 A6.B6 A5.B6 A4.B6 A3.B6 A2.B6 A1.B6 A0.B6 A3.B7 A2.B7 A1.B7 A0.B7 A3.B7 A2.B7 A1.B7 A0.B7
P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0
The equation is : .
1
0
1
0
2 A(m)B(n) n)P(mm
i
n
j
jijiba
7
1-bitREG
+
G2
G1
0 00
Serial Register
qdReset=0
x0y0
x0y0
0
0
1
x0y0
0
CLK CLK/(N+1)
CLK
0
0
Slide 1
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
XA
B
P
8
1-bitREG
+
G2
G1
S0 00
Serial Register
qdReset=0
x1y0
x1y0
0
0
1
x1y0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result Si: the ith bit of the final result
Slide 2
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
9
1-bitREG
+
G2
G1
x1y0 0S0
Serial Register
qdReset=0
x2y0
x2y0
0
0
1
x2y0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result Si: the ith bit of the final result
Slide 3
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
10
1-bitREG
+
G2
G1
x2y0 S0x1y0
Serial Register
qdReset=0
x3y0
x3y0
0
0
1
x3y0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result Si: the ith bit of the final result
Slide 4
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
11
1-bitREG
+
G2
G1
x3y0 x1y0x2y0
Serial Register
qd
Reset=1
00 0
S0
0
0
0
0
CLK CLK/(N+1)
CLK
S0
0
Si: the ith bit of the final result Si: the ith bit of the final result
Slide 5
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
12
1-bitREG
+
G2
G1
0 x2y0x3y0
Serial Register
qdReset=0
x0y1
x0y1
x1y0
0
1
S1
C1
CLK CLK/(N+1)
CLK
x1y0
x1y0
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 6
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
13
1-bitREG
+
G2
G1
S1 x3y00
Serial Register
qdReset=0
x1y1
x1y1
x2y0
1
S20
C1
CLK CLK/(N+1)
CLK
x2y0
x2y0C2
0
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 7
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
14
1-bitREG
+
G2
G1
S20 0S1
Serial Register
qdReset=0
x2y1
x2y1
x3y0
1
S30
C20
CLK CLK/(N+1)
CLK
x3y0
x3y0
C30
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 8
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
15
1-bitREG
+
G2
G1
S30 S1S2
0
Serial Register
qdReset=0
x3y1
x3y1
0
1
S40
C30
CLK CLK/(N+1)
CLK
0
0C4
0
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 9
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
16
1-bitREG
+
G2
G1
S40 S2
0S30
Serial Register
qdReset=1
00 0
S1
0
S50
C40
CLK CLK/(N+1)
CLK
S1
0
S0
C50=0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 10
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
17
1-bitREG
+
G2
G1
S50 S3
0S40
Serial Register
qd
Reset=0
x0y2
x0y2
S20
1
S2
0
CLK CLK/(N+1)
CLK
S20
C21
S20
S1 S0
Slide 11
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
18
1-bitREG
+
G2
G1
S2 S40S5
0
Serial Register
qd
Reset=0
x1y2
x1y2
S30
1
S31
CLK CLK/(N+1)
CLK
S30
C21
S30
C31
S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 12
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
19
1-bitREG
+
G2
G1
S31 S5
0S2
Serial Register
qd
Reset=0
x2y2
x2y2
S40
1
S41
CLK CLK/(N+1)
CLK
S40
C31
S40
C41
S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 13
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
20
1-bitREG
+
G2
G1
S41 S2S3
1
Serial Register
qd
Reset=0
x3y2
x3y2
S50
1
S51
CLK CLK/(N+1)
CLK
S50
C41
S50
C51
S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 14
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
21
1-bitREG
+
G2
G1
S51 S3
1S41
Serial Register
qd
Reset=1
00 0
S2
0
S60
CLK CLK/(N+1)
CLK
S2
C51
0
S1 S0
C60=0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 15
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
22
1-bitREG
+
G2
G1
S60 S4
1S51
Serial Register
qd
Reset=0
x0y3
x0y3
S31
1
S3
CLK CLK/(N+1)
CLK
S31
C32
0
S31
S2 S0S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
Slide 16
23
1-bitREG
+
G2
G1
S3 S51S6
0
Serial Register
qd
Reset=0
x1y3
x1y3
S41
1
S4
CLK CLK/(N+1)
CLK
S41
C32
S41
C42
S2 S0S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 17
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
24
1-bitREG
+
G2
G1
S4 S60S3
Serial Register
qd
Reset=0
x2y3
x2y3
S51
1
S5
CLK CLK/(N+1)
CLK
S51
C42
S51
C52
S2 S0S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 18
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
25
1-bitREG
+
G2
G1
S5 S3S4
Serial Register
qd
Reset=0
x3y3
x3y3
S60
1
S6
CLK CLK/(N+1)
CLK
S60
C52
S60
C61
S2 S0S1
Slide 19
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
26
1-bitREG
+
G2
G1
S6 S4S5
Serial Register
qd
Reset=1
00 0
S3
0
S7
CLK CLK/(N+1)
CLK
S3
C61
0
0
S2 S0S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 20
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
27
1-bitREG
+
G2
G1
S7 S5S6
Serial Register
qd
Reset=0
00 0
1
CLK CLK/(N+1)
CLK
S4
0
S3 S0S1S2
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 21
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
28
1-bitREG
+
G2
G1
S7 S5S6
Serial Register
qd
Reset=0
00 0
1
CLK CLK/(N+1)
CLK
S4
0
S3 S0S1S2
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 21
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
29
D
D DD
DD
+ ++
y0 y3y2y1
x0
S0
0
000
000
00
S0 S0S0 S0
Si: the ith bit of the final result Si: the ith bit of the final result
Slide 1
XA
B
P
30
D
D DD
DD
+ ++
y0 y3y2y1
x1
x1y0
x0
000
00x0y1
00
S1
C1
S1 S1 S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 2
XA
B
P
31
D
D DD
DD
+ ++
y0 y3y2y1
x2
x2y0
x1
00C1
0x0y2x1y1
0
S20
C20
S2 S2
x0
C21
S2 S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 3
XA
B
P
32
D
D DD
DD
+ ++
y0 y3y2y1
x3
x3y0
x2
0
x0y3x1y2x2y1
x0
S30
C20
S31 S3
x1
S3 S2 S1 S0
C21
C30 C3
1 C32
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 4
XA
B
P
33
D
D DD
DD
+ ++
y0 y3y2y1
0
x3
x1y3x2y2x3y1
x1
S40
C30
S41 S4
x2
C31
C40 C4
1
C32
0
S4 S3 S2 S1 S0
C42
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 5
XA
B
P
34
D
D DD
DD
+ ++
y0 y3y2y1
0x2y3x3y2
x2
C40
C40
S51 S5
x3
S5 S4 S3 S2 S1 S0
C41
C50
C42
0
C510
0 0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 6
XA
B
P
35
D
D DD
DD
+ ++
y0 y3y2y1
0x3y30
x3
0
0
C50 S6
0
C50
0
C51
0
C60
0 0
S6 S5 S4 S3 S2 S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 7
XA
B
P
36
D
D DD
DD
+ ++
y0 y3y2y1
000
0
0
0
0 S7
0
0
0
C6
0
00
0 0
S7 S6 S5 S4 S3 S2 S1 S0
Si: the ith bit of the final result
Ci: the only carry from column i
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 8
XA
B
P
37
8 bit Adder
MUX
0
INPUT Ain (7 downto 0)
REGA
Result (7 downto 0)Result (15 downto 8)
INPUT Bin (7 downto 0)
CLOCK
REGBREGC
Shift Add Multiplier Design Implementation XA
B
P
38
Synchronous Shift and Add Multipliercontroller
Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load
register and a shift register, respectively; Test: The LSB in the shift register which contains the
multiplier is tested to decide the next state;
XA
B
P
39
Synchronous Shift and Add Multiplier ControllerDesign
Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ;
Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state;
When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state;
Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication.
XA
B
P
40
Multiplicand
n-bit AdderShift and AddControl Logic
An-1 A0A1An ...C
Multiplier
Qn-1 Q0Q1Qn ...
Shift Right
Add
Slide 1
n-bit Multiplier:
Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit
Q0=0: Registers C, A, Q are shifted to the right one bit
n-bit Multiplier:
Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit
Q0=0: Registers C, A, Q are shifted to the right one bit
41
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right
Add
Multiplicand
1 0 1 1
0 000 1 101
Slide 2
Example: 4-bit Multiplier
Initial Values
Example: 4-bit Multiplier
Initial Values XA
B
P
42
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
1 110 1 101
Slide 3
Example: 4-bit Multiplier
First Cycle--Add
Example: 4-bit Multiplier
First Cycle--Add
XA
B
P
43
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
0 101 1 011
Slide 4
Example: 4-bit Multiplier
First Cycle--Shift
Example: 4-bit Multiplier
First Cycle--Shift XA
B
P
44
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
1 1110 010
Slide 5
Example: 4-bit Multiplier
Second Cycle--Shift
Example: 4-bit Multiplier
Second Cycle--Shift
XA
B
P
45
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
1 101 1 111
Slide 6
Example: 4-bit Multiplier
Third Cycle--Add
Example: 4-bit Multiplier
Third Cycle--Add XA
B
P
46
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
0 011 1 111
Slide 7
Example: 4-bit Multiplier
Third Cycle--Shift
Example: 4-bit Multiplier
Third Cycle--Shift
XA
B
P
47
4-bit AdderShift and AddControl Logic
1
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
0 100 1 111
Slide 8
Example: 4-bit Multiplier
Fourth Cycle--Add
Example: 4-bit Multiplier
Fourth Cycle--Add
XA
B
P
48
4-bit AdderShift and AddControl Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
1 000 1 111
Slide 9
Example: 4-bit Multiplier
Fourth Cycle--Shift
Example: 4-bit Multiplier
Fourth Cycle--Shift
XA
B
P
49
4*4 Synchronous Shift and Add Multiplier DesignLayout Design
Floor plan of the 4*4 Synchronous Shift and Add Multiplier
XA
B
P
51
Example : (simulated by Ovais Ahmed, Fall_03,project)
Multiplicand = 100010012 = 8916
Multiplier = 101010112 = AB16
Expected Result = 1011011100000112 =5B8316
XA
B
P
52
Regular structure based on add and shift algorithm. Addition is mainly done by carry save algorithm. Sign bit extension results in a higher capacitive load and slows down the speed of the circuit.
Array MultiplierXA
B
P
53
Addition with CLA
a0a1a2a3
Four-bit Adder
a0a1a2a3
a0a1a2a3
Four-bit Adder
a0a1a2a3
Four-bit Adder
b0
b1
b2
b3
Cin
Ci
n
Cin
Cout
Cout
Cout
0
0
0
0
Product (A*B)
A = a3a2a1a0
B = b3b2b1b0
XA
B
P
54
Array Multiplier with CSA
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
F.A
Ci Si
P00P10P01P11P02P12P03 0 0 0
P20P21P22P13
P30P31P32P23
0P33
R0R1R2R3
R4R5R6R7
Total of 16
gates
A0A1A2A3
B0
B1
B2
B3
Pij
Aj Bi
30
30
j
i
**Pij =Ai Bj
XA
B
P
55
Critical Path with Array Multipliers
HAFAFA FA
HAFAFA FA
HAFAFA FA
Two of the possible paths for the Ripple-Carry based 4*4 MultiplierArea = (N*N) AND Gate + (N-1)N Full-Adder
Delay = τ HA + (2N-1) τ FA
XA
B
P
57
x 0y0
x 1y0
x 0y1
x 3y0
x 2y1
x 1y2
x 0y3
x 2y0
x 1y1
x 0y2
x 4y0
x 3y1
x 2y2
x 1y3
x 0y4
x 4y1
x 3y2
x 2y3
x 1y4
x 4y2
x 3y3
x 2y4
x 4y3
x 3y4
x 4y4
P1P2P3P4P5P6P7P8P9 P0
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
Wallace TreeXA
B
P
5904/18/23 Concordia VLSI Lab 59
Background
Baugh-Wooley Algorithm
• Convert negative partial products to positive representation
• No sign-extension required
)2*2*(*)2*2*(*2
0
11
2
0
11
ik
ii
kk
ik
ii
kk yyxxYX
ikk
i
ikik
k
i
ikji
k
j
ji
k
i
kkk xyyxyxyx
1
2
0
11
2
0
1
2
0
2
0
2211 2*2*)2*2**(
XA
B
P
6004/18/23 Concordia VLSI Lab 60
examples of 5-by-5 Baugh-Wooley
FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA FAFA
1
P0
a4b0' a3b0a1b0a2b0 a0b0
P9 P8 P7 P6 P5 P4 P3 P2 P1
0 000
a0b1
a3b1 a2b1a1b1
a0b2a3b2 a2b2 a1b2
a4b3'
a4b2'
a4b1'
a4' b4'
a0b3a3b3 a2b3 a1b3
a0'b4a4b4 a3'b4 a2'b4 a1'b4
a4
b4
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
XA
B
P
61
a7 a6 a5 a4 a3 a2 a1 a0
* a7 a6 a5 a4 a3 a2 a1 a0
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0
a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1
a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2
a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3
a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4
a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5
a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6
a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
-------------
a7*a6 a7*a5 a7*a4 a7*a3 a7*a2 a7*a1 a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 ‘0' a0
a7*a7 a6*a5 a6*a4 a6*a3 a6*a2 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1
a6*a6 a5*a4 a5*a3 a5*a2 a4*a2 a3*a2 a2*a2
a5*a5 a4*a3 a3*a3
a4*a4
S15, S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0
XA
B
P
62
a1a0a1
‘0’
a2a0
‘0’
‘0’
‘0’
‘0’
a5a0
a4a1
a3a2
a5a1
a4a2a6a0
a6a1
a5a2a7a0
a6a2
a5a3a7a1
a3a1
a4a0
a2a1
a2a3a0‘0’ a0
‘0’a3a3a4
a4
a6a3
a5a4a7a2
a5
a6a4
a7a3
a6a5
a6a7a4
a7a5
a7
a7a6
S0S1S2S4S5S6S7S8S9S10S11S12S13S14S15 S3
Example of an 8bit squarer
N*N N=8bits
XA
B
P
64
1 Booth (Radix-4) Multiplier Radix-4 (3 bit recoding) reduces number of partial products to be added by half. Great saving in area and increased speed.
A = -an-12
n-1 + an-22n-2 + an-32
n-3 + …. + a12 + a0
B = -bn-12n-1 + bn-22
n-2 + bn-32n-3 + …. + b12 + b0
Base 4 redundant sign digit representation of B is (n/2) - 1
B = 22i Ki
i = 0
XA
B
P
65
Ki is calculated by following equation
Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2
3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and
corresponding Ki is calculated.
B is always appended on the right with zero (b-1 = 0), and n is
always even (B is sign extended if needed). The product AB is then obtained by adding n/2 partial products. (n/2) - 1
AB= P = 22i Ki A
i = 0
66
Booth AlgorithmDecoding of multiplier to generate signals for hardware use
Xi+1 Xi Xi-1 OP NEG ZERO TWO
0 0 0 0 0 1 0
1 0 0 2 1 0 1
0 1 0 1 0 0 0
1 1 0 1 1 0 0
0 0 1 1 0 0 0
1 0 1 1 1 0 0
0 1 1 2 0 0 1
1 1 1 0 1 1 0
XA
B
P
67
Booth Algorithm A Booth recoded multiplier examinesThree bits of the multiplicand at a timeIt determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand.The operation to be performed is based on the current two bits of the multiplicand and the previous bit
Xi+1 X Xi-1 Zi/2
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 2
1 0 0 -2
1 0 1 -1
1 1 0 -1
1 1 1 0
XA
B
P
68
BIT
M is
21 20 2-1OPERATION
multiplied
Xi Xi+1 Xi+2
by
0 0 0 add zero (no string) +0
0 0 1 add multipleic (end of string) +X
0 1 0 add multiplic. (a string) +X
0 1 1 add twice the mul. (end of string) +2X
1 0 0 sub. twice the m. (beg. of string) -2X
1 0 1 sub. the m. (-2X and +X) -X
1 1 0 sub . the m. (beg. of string) -X
1 1 1 sub. zero (center of string) -0
69
Booth Algorithm-a higher radix Multiplication
Multiplicand A = ● ● ● ● Multiplier B = (●●)(●●)
Partial product bits ● ● ● ● (B1B0)2A40
Partial product bits ● ● ● ● (B3B2)A41
Product P = ● ● ● ● ● ● ● ●
XA
B
P
70
The following example is used to show how the calculation is done properly. Multiplicand X = 000011
Multiplier Y = 011101 0 1 1 1 0 1 0 After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together.
X* +1 000000000011 X* -1 1111111101 X* +2 00000110 -------------------------------------------- 000001010111
Example
Added to the
multiplier
XA
B
P
7204/18/23 Concordia VLSI Lab 72
Sign extension
Traditional sign-extension scheme
• Segment the input operands based on the size of embedded blocks
• Multiply the segmented inputs and extend the sign bit of each partial products
• Sum all partial products
Segmented input
operands
Sign extension
×
+
Final result
partial products
Sign
XA
B
P
73
Booth Algorithm-Example 1
Example 1:
011101 (+29)
000011 (+3)
0
+2 -1 +1
000000000011111111110100000110
0000010101111 (+87)
XA
B
P
74
Booth Algorithm Example 2
011101 (+29) 111101 (-3)
0
+2 -1 +1
111111111101000000001111111010
1111101010011
2s complement ofmultiplicand
(-87)
Notice sign extensions
XA
B
P
75
Booth Algorithm-Example 3
100011 (-29)
111101 (-3)
0
-2 +1 -1
000000000011111111110100000110
0000010101111
Shifted 2scomplement
(+87)
Notice the sign extensions
XA
B
P
77
Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement
then the S’s on right side of the diagram can be removed
Template to reduce sign extensions for Booth Algorithm
For hardware implementation
78
Comparison of Template and the sign extension
S1S1S1S1S1S1S1
S2S2S2S2S2
S3S3S3
S4
B
A
P
S1S1S1
S21
S3
B
A
P
Sign template Sign extension
S1S1S1S1S1S1S1
S2S2S2S2S2
S3S3S3
S4
B
A
P
S1S1S1S1S1S1S1
S2S2S2S2S2
S3S3S3
S4
B
A
P
S1S1S1
S21
S3
B
A
P
S1S1S1
S21
S3
B
A
P
Sign template Sign extension
XA
B
P
79
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9 8 7 6 5 43 2
1 0
S
0
S
0
S
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
A
0
1 S
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
A
1
1 S
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
A
2
1 S
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
A
3
1 S
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
A
4
1 S
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
A
5
1 S
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
6
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
A
7
S8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
A
8
Partial Product matrix
generated for a 16 * 16 bit
multiplication,
Using booth and the
template given in
previous slide
80
Using the Template 25 * -35
Sign bit
0 0 0 1 1 0 0 1 Add SS 1 1 0 1 1 1 0 1 0 Add inverted SAdd Inverted sign and add 1 1 0 0 0 0 0 1 1 0 0 1 * 1Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1 1 0 0 1 1 0 0 1 0 * 2 No sign bit 1 1 0 0 1 1 1 * -1
1 1 1 1 0 0 1 0 0 1 0 1 0 1 This is a –ve number. Convert it 0 0 0 0 1 1 0 1 1 0 1 0 1 1
512 256 64 32 8 2 1 = 875
Example of using the template25 * - 35 with -35 as the multiplier. Using 8 bit representation
XA
B
P
81
Booth Multiplier Components
Multiplier
Multiplicand
Booth Encoder
PPU (Partial products unit)
PPA(Partial products adding unit)
Product
XA
B
P
82
+ + + + + + + + + + +
+ + + + + + + + +
+
P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16
0
+++++++++++++++
+ + + + + + +
+ + + + +
0
Partial Product PP0,PP1,PP2(15 downto 0)
Partial Product PP3(15 downto 0)
Ripple Carry Adder
Critical Path
Pipeline Register
+ + + + + + + + + + +
+ + + + + + + + +
+
P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16
0
+++++++++++++++
+ + + + + + +
+ + + + +
0
Partial Product PP0,PP1,PP2(15 downto 0)
Partial Product PP3(15 downto 0)
Ripple Carry Adder
Critical Path
Pipeline Register
Wallace Tree and Ripple Carry Adder Structure.
Of 8*8 multiplier With PipelineX
A
B
P
83
Mulbegin
Stop
A3bit
CLK
Shift
Mux11
Init
Mulend
FSMCLR
Mux12
Mux0
X
SH
LD
D
CLK
CLR
Q16 32
reg _ 2 le ft3 2
A
B
Sum
Cout
Cin
37
37
37
Adder37
10
A 37
B 37Y 37
Sel
Mux37
D 37 Q 37
CLK
CLRR eg is te r3 7
FinishCLK
CLRC ou n ter2 0
StartMulbegin
CLK
A
CLK
Start
Doubleshift
Init
Start
Stop
QA(0-2)
CLK
Doubleshift
Mux11
Init
Mulend
CLK Finish
Start
Result
Start
Mux0
Start
not used
Start
B
InitShift
CLK
Mulend
SH
LD
D
CLK
CLR
Q16 17
reg 2 rig h t1 7
=0; A 16=0=1, A 16=1
F17
endcheck
Start
B
Init
Shift
CLK
2 scom p lem en t
SH
LD
D
CLK
CLR
Q16 32
reg _ 2 le ft3 2
SH
LD
D
CLK
CLR
Q16 32
reg _ 2 le ft3 2
*2 (sh ifte r)
*2 (sh ifte r)
11100100
A 32
B 32Y
32
ctrl1
mux4-32
ctrl0
C 32
D 32
Mul11 Mul12
signexpansion
5
Mux12
Mux0
Hardware implementation of Booth with shift and add X
A
B
P
84
Simulation PlanX
A
B
P
32-bit Signal Generator A
32-bit Signal Generator B
Behavioral Multiplier
A * B
64-bitComparator
A[31:0]
Result
Failed Number
P[63:0]
B[31:0]
My_P[63:0]
My Multiplier
Array MultiplierModified Booth
MultiplierWallace Tree
Multiplier
Modified Booth-Wallace Tree
Multiplier
Twin PipeSerial-Parallel
Multiplier
87
Simulation For Signed S/P MultipliersXA
B
P
There are 340 ns delay between the result and the operators because of the D flip-flops delay.
89
Another implementation of the above after pipelining, the place and rout has paced the design in different places.
XA
B
P
92
Comparison of MultipliersXA
B
P
Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005
ArrayMultiplier
Modified Booth Multiplier
Wallace-Tree Multiplier
Modified Booth-Wallace Tree Multiplier
Twin Pipe Serial-Parallel Multiplier
Behavioral Multiplier
Area – Total CLB’s (#) 3076.50 2649.50 3325.50 2672.50 490.00 2993.50
Maximum Delay D(ns) 35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33
Total Dynamic Power P (W)
7.52 6.33 7.46 6.41 0.28 6.24
Delay ·Power Product (DP) (ns W)
268.98 154.64 141.14 118.76 30.62 307.58
Area•PowerProduct (AP)(# W)
23128.20 16771.60 24793.93 17127.79 139.54 18665.07
Area•DelayProduct (AD)(# ns)
1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05
Area•Delay2
Product(AD2)(# ns2)
3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06
93
Comparison of MultipliersXA
B
P
Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005
ArrayMultiplier
Modified Booth Multiplier
Wallace-Tree Multiplier
Modified Booth-Wallace Tree Multiplier
Twin Pipe Serial-Parallel Multiplier
Behavioral Multiplier
Area – Total CLB’s (#) 3280.50 2800.00 3321.50 2845.50 487.00 3003.00
Maximum Delay D(ns) 37.23 25.33 18.93 18.33 107.52 44.50
Total Dynamic Power P (W) 7.57 6.66 7.32 6.66 0.29 6.26
Delay ·Power Product (DP) (ns W)
281.88 168.77 138.60 122.13 30.66 278.53
Area•PowerProduct (AP)(# W)
24837.98 18656.40 24319.36 18959.57 138.89 18795.78
Area•DelayProduct (AD)(# ns)
1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05
Area•Delay2
Product(AD2)(# ns2)
4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06
94
Comparison of MultipliersXA
B
P
The relation of Area and Delay for behavioral multiplier -- "banana curve" 2950
3000
3050
3100
3150
3200
3250
0 20 40 60 80
Del ay (ns)
Area
(#)
Ser i es1
Change the value of “set_max_delay” in Script file (ns)
0 10 20 30 40 50 60 >60
Area(#) 3014.5
3013.0
3110.0
3193.5
3019.5
2999.5
2978.5
2978.5
Power(w)
6.6499
6.6470
7.5683
8.1878
8.0645
8.0419
8.0156
8.0156
Delay(ns) 31.98 31.98 30.93 30.08 39.93 49.88 59.63 59.63
95
Comparison of MultipliersXA
B
P
By Chen Yaoquan, M.Eng. 2005
ArrayMultiplier
Modified Booth Multiplier
Wallace-Tree Multiplier
Modified Booth-Wallace Tree Multiplier
Twin Pipe Serial-Parallel Multiplier
Behavioral Multiplier
Area Medium Small Large Small Smallest Medium
Critical Delay
Medium Fast Very Fast Fastest Very Large Large
PowerConsumption
Large Medium Large Medium Smallest Medium
Complexity Simple ComplexMore
ComplexMore
ComplexSimple Simplest
Implement Easy Medium Difficut Difficut Easy Easiest
97
Synthesis for Signed MultipliersXA
B
P
ArrayModified Booth
Wallace Tree
Modified Booth-Wallace Tree
Twin Pipe S/P Behavioral
98
Synthesis for Unsigned MultipliersXA
B
P
ArrayModified Booth
Wallace Tree
Modified Booth-Wallace Tree
Twin Pipe S/P Behavioral
99
Conclusion X
A
B
P
• Modified Booth and Wallace Tree are the best techniques for high speed multiplication.
• Wallace Tree has the best performance, but it is hard to implement.
• Booth algorithm based multipliers have lower area among parallel multipliers.
• For behavioral multipliers, the area will increase while the delay decreases.
100
Comparison Array
MultiplierModified Booth Multiplier
Wallace Tree Multiplier
Modified Booth & Wallace Tree Multiplier
Twin Pipe Serial-Parallel Multiplier
Area – Total CLB’s (#)
1165 1292 1659 1239 133
Maximum Delay (ns)
187.87ns
139.41ns
101.14ns
101.43ns
22.58ns(722.56ns)
Power Consumption at highest speed (mW)
16.6506m
W(at 188ns)
23.136mW(at 140ns)
30.95mW
(at 101.14ns)
30.862mW
(at 101.43ns)
2.089mW
(at 722.56ns)
Delay PowerProduct (DP)(ns mW)
3128.15
3225.39
3130.28
3130.33
1509.42
Area PowerProduct (AP)(# mW)
19.397 x
103
29.891 x 103
51.346 x 103
38.238 x 103
277.837
Area DelayProduct (AD)(# ns)
218.868 x
103
180.118 x
103
167.791 x 103
125.671 x 103
96.101 x 103
Area Delay2
Product(AD2)(# ns2)
41.119 x
106
25.110 x 106
16.970 x 106
12.747 x 106
69.438 x 106
XA
B
P
101
NOTICE
The rest of these slides are for extra information only and are not part of the lecture
XA
B
P
107
Baugh-Wooley two's complement multiplier:
•
FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA FAFA
1
P0
a4b0' a3b0a1b0a2b0 a0b0
P9 P8 P7 P6 P5 P4 P3 P2 P1
0 000
a0b1
a3b1 a2b1a1b1
a0b2a3b2 a2b2 a1b2
a4b3'
a4b2'
a4b1'
a4' b4'
a0b3a3b3 a2b3 a1b3
a0'b4a4b4 a3'b4 a2'b4 a1'b4
a4
b4
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
108
Example of Baugh-Wooley Two’s Complement Multiplication
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 P
a4' a3'b4 a2'b4 a1'b4 a0'b4
X
A
B
a4 a3 a2 a1 a0
b4 b3 b2 b1 b0
a4b0' a3b0 a2b0 a1b0 a0b0
a4b4 a4b3' a3b3 a2b3 a1b3 a0b3
a4b2' a3b2 a2b2 a1b2 a0b2
a4b1' a3b1 a2b1 a1b1 a0b1
+b4' a4
1 b4
1 1 1 0 1 1 1 1 1 1
0 0 1 0 0
= -65
X =13
= -5
0 1 1 0 1
1 1 0 1 1
1 0 0 0 0
0 1 0 1 1
0 0 1 0 1 1
0 1 0 1 1
+1 1
1 0
0 0 0 1 0 0 0 0 0 1
1 0 0 0 0
= 65
X
=13
= 5
0 1 1 0 1
0 0 1 0 1
0 0 0 0 0
0 1 1 0 1
0 0 0 0 0 0
0 1 1 0 1
+1 0
1 0
0 0 0 1 0 0 0 0 0 1
0 1 1 0 0
= 65
X
= -13
= -5
1 0 0 1 1
1 1 0 1 1
0 0 0 1 1
0 0 0 1 1
1 0 0 0 1 1
1 0 0 0 0
+0 1
1 1
1 1 1 0 1 1 1 1 1 1
1 0 0 1 0
= -65
X
=13
= -5
0 1 1 0 1
1 1 0 1 1
0 1 1 0 1
0 1 1 0 1
0 0 1 1 0 1
0 0 0 0 0
+0 0
1 1
110
Cluster MultipliersXA
B
P
Multiplier
A8~A7 A3~A0
4-bit Multiplier
Final Addition Stage
8-bit Latch
8-bit Latch
8
/CLR
CLK
CLK
4-bit Multiplier
8-bit Latch
8-bit Latch
8
/CLR
CLK
CLK
Multiplicand
B8~B7 B3~B0
4-bit Multiplier
8-bit Latch
8-bit Latch
8
/CLR
CLK
CLK
4-bit Multiplier
8-bit Latch
8-bit Latch
8
/CLR
CLK
CLK
44 4 4
EN3 EN2 EN1 EN0
16
P 8-bit cluster low power multiplier
The circuit used to generate the enable signal
111
Cluster Multipliers
• Dividing the multiplication circuit into clusters (blocks) of smaller multipliers
• Applying clock gating techniques to disable the blocks that are producing a zero result.
• Features– Low Power (claims 13.4 % savings)
XA
B
P
112
Multiplexer-Based Array MultipliersXA
B
P
1
0
1
1
2 22n
j
jj
n
j
jjj ZyxP
01Z0
2Z
12Z
03Z
13Z
23Z
04Z
14Z
24Z
34Z
jjjjj yXYxZ 021 ...XXXX jjj
Z j
xjyj
113
Multiplexer-Based Array MultipliersXA
B
P
Two types of cells:
Cell 1: produce the terms Zij2j and includes a full adder of
carry save adder array
Cell 2: produce the terms xjyj 2j and includes a full adder of
carry save adder array
114
Multiplexer-Based Array Multipliers
• Characteristics– Faster than Modified Booth– Unlike Booth, does not require encoding logic– Requires approximately N2/2 cells– Has a zigzag shape, thus not layout-friendly
XA
B
P
115
Multiplexer-Based Array MultipliersXA
B
P
• Improvement
– More rectangular layout – Save up to 40 percent area without penalties – Outperforms the modified Booth multiplier in both speed and power by 13% to 26%
116
Gray-Encoded Array Multiplier XA
B
P
Dec Hyb Dec Hyb Dec Hyb Dec Hyb
0 0000 4 0100 -8 1100 -4 1000
1 0001 5 0101 -7 1101 -3 1001
2 0011 6 0111 -6 1111 -2 1011
3 0010 7 0110 -5 1110 -1 1010
• 2’s complement Hybrid Coding– Having a single bit different for consecutive values
– Reducing the number of transitions, and thus power ( for highly correlated streams ).
118
Gray-Encoded Array Multiplier
• Characteristics– Uses gray code to reduce the switching activity
of multiplier– Saves 45.6% power than Modified Booth– Uses greater area(26.4% ) than Modified Booth
XA
B
P
119
Ultra-high Speed Parallel Multiplier
• How to ultra-high speed?– Based on Modified Booth Algorithm and Tree
Structure (Column compress)– Chooses efficient counters (3:2 and 5:3)– Uses the new compressor (faster 20% )– Uses First Partial product Addition (FPA)
Algorithm (reducing the bits of CLA by 50%)
XA
B
P
120
Ultra-high Speed Parallel Multiplier XA
B
P
Calculate the partial products as soon as possible.
The final CLA is only 16-bit instead of 32-bit.
Divide into 3 rows or 5 rows only (most efficient).
Calculation process using parallel counter in case of 16x16---Totally reduce delay by about 30%
121
ULLRLF Multiplier
• ULLRLF stands for Upper/Lower Left-to-Right Leapfrog.
• Combine the following techniques: – Signal flow optimization in [3:2] adder array
for partial product reduction,– Left-to-right leapfrog (LRLF) signal flow,– Splitting of the reduction array into upper/lower
parts.
XA
B
P
122
ULLRLF MultiplierXA
B
P
1) Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also.
PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C
123
ULLRLF MultiplierX
A
B
P
2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power.
The sum signals skip over alternate rows.
124
ULLRLF MultiplierX
A
B
P
3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced.
Only n+2 bits
125
ULLRLF MultiplierXA
B
P
Floorplan of ULLRLF (n = 32)
•ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. • With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures.
126
Signed Array MultiplierXA
B
P
HAFA
FAFA
HA
HA
A31
A29A31
A31
A31 A30
A31
HA
FAFA
FA
A30 A0
A1 A0
B2
A2 A1 A0
A3 A2 A1
B0
FA FAFA
A30 A1 A0
B31
32-bit carry look ahead adder
FA
A28
A29
A30
A0
B1
B3
FA
A0
P63 P62 P61 P34 P33 P31 P30 P2 P1 P0P3
STAGE 4 TO 30 (Each stage includes 32 AND gates, 31 full adders ,1 half adder and 1 NOT gate)
1
FA
32*32-Bit Array Multiplier for Signed Number
One stage of carry save adder
127
Unsigned Array MultiplierXA
B
P
A31
A29
A31
A31
A31
A31
HA
FA FA HA
HAHA FAFA
FA
A30 A0
A30 A1 A0
B2
A2 A1 A0
A3 A2 A1
B0
FA FAFA
A30 A1 A0
B31
32-bit carry look ahead adder
FA
FA
A28
A29
A30
A0
B1
B3
FA
A0
P63 P62 P61 P33 P32 P31 P30 P2 P1 P0P3
STAGE 4 TO 30 (Each stage includes 32 AND gates, 31 full adders and 1 half adder)
32*32-Bit Array Multiplier for Unsigned Number
One stage of carry save adder
129
Signed Modified Booth MultiplierX
A
B
P
SEL SEL SEL SEL SEL SEL
A0 0A1A2A3A4
SEL SEL SEL SEL
A0 0A1A2
FA FA FA
SEL SEL SEL
A0 0A1
SEL SEL SEL SEL SEL SEL
A31 A31 A30 A29 A28 A27 A26
FA FA FAFAHA
1
SEL SEL SEL SEL
A31 A31 A30 A29 A28
1
SEL SEL
A31 A31 A30
1
HA FA HA HA HA HA HA
INVERT00
P0P1P2P3
INVERT2
P4
1
P63 P62 P61 P60 P5
64-bit carry look ahead adder
STAGE 3 TO 15 (Each stage includes 33 PP selectors, 31 full adders ,1 half adder and 1 NOT gate)
INVERT n
Booth Encoder
Booth Encoder
Booth Encoder
Booth Encoder
B[1:0]0
B[3:1]
B[5:3]
B[31:5]
X1[0]X2[0]
INVERT0
X1[1]X2[1]
INVERT1
INVERT1
X1[2]X2[2]
INVERT2
X1[n]X2[n]
INVERT n
One stage
32*32-Bit Modified Booth Multiplier for Signed Number
0
131
Unsigned Modified Booth MultiplierXA
B
P
SEL SEL SEL SEL SELSEL_END
A0 0A1A2A3A4
SEL SEL SEL
A0 0A1A2
FA FA FA
SEL SEL
A0 0A1
SEL SEL SEL SEL SEL
A31 A30 A29 A28 A27 A26
FA FA FAFAHA
1
SEL SEL SEL
A31 A30 A29 A28
1
SEL
A31 A30
FA HA HA HA HA HA HA
S[0]0
P0P1P2P3
S[2]
P4P63 P62 P61 P5
S[i]
Booth Encoder
Booth Encoder
Booth Encoder
Booth Encoder
B[1:0]0
B[3:1]
B[5:3]
B[i+1, I, i-1]
X1[0]X2[0]S[0]
X1[1]X2[1]S[1]
S[1]
X1[2]X2[2]S[2]
X1[i]X2[i]S [i]
One stage
32*32-Bit Modified Booth Multiplier for Unsigned Number
0
SEL_END
SEL_END
SEL_END
SEL_END
HA
1
S[1]
FA
SEL_END
S[2]
FA FA
SEL
A0 0
SEL SEL
A31 A30 A29
FAHA
1S16
Booth Encoder
00B[31]
X1[16]X2[16]S[16]SEL_
ENDSEL_END
FA
P6
FA
SEL
A1
P32P33P34P35 P31
64-bit carry look ahead adder
STAGE 3 TO 15 (Each stage includes 33 PP selectors, 32 full adders ,1 half adder and 1 NOT gate)
S[0]
132
Wallace Tree multipliers X
A
B
P
32 partial products added in Wallace Tree Adder
64-bit Carry Look-ahead Adder
A[31:0] B[31:0]
C[63:0] S[63:0]
P[63:0]
133
Wallace Tree multipliers
............................................................................................................................ ... ........................................................................................................................ ....................................................... .................................................... . ................................................... . .................................................. ................................................. ..............................................................................................……......................................…….....................................…….............................................................................………............................………….......................………….....................………….........................…………..................……………..............……............... ......... ........... ....
................................................................. .......................................................... .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... ...............................................…................................................................................................……......................................……......................................……...................................………………….................
............................................................................................................................ .. .............................................................................................................................................................................…...................................................……..............................................……....................................... ....….................................…………………............................ ...................... ........................
............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . ........................
............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... .
............................................................................................................................ .. ......................................................................................................................... ..............................................
............................................................................................................................ ...................................................................................................
................................................................................................................................. .
................................................................. ............................................................... .............................................................. ............................................................. ............................................................ ........................................................... .......................................................... ......................................................... ........................................................ ....................................................... ...................................................... ..................................................... .................................................... ................................................... ................................................. .................................................................................................................................................. .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ....................................... ..................................................................
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ....................................................................
1
2
3
4
5
6
7
8
XA
B
P
• Use the 3:2 counters and 2:2 counters• Number of levels of = log (32/2) / log (3/2) ≈8 • Irregular structure • Fast
3:2 counter
....Carry
Sum
..
...
Carry
Sum
2:2 counter
Input:
Output:
134
Wallace Tree multipliers X
A
B
P
Carry Propagate/ Generate uni t
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
8-Bi t BCLA
64-Bi t Summati on Uni t
8-Bi t BCLA
B63
P63-P56G63-G56
P7-P0G7-G0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C7-C0C63-C56
Ci n
C8PM1GM1
PM0GM0
C16PM2GM2
C24PM3GM3
C40PM5GM5
PM4GM4
C48PM6GM6
C56PM7GM7
64-Bi t Carry Look Ahead Adder
B0 A63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A0
P63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P0 G63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C55-C48 C47-C40 C39-C32 C31-C24 C23-C16 C15-C8
P63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P0 C63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .S63 S0C64
2-level hierarchical
136
Modified Booth-Wallace Tree MultipliersXA
B
P
• Use the 3:2 counters and 2:2 counters• Number of levels of = log (16/2) / log (3/2) ≈6 • Irregular structure • Fast• Less area
Rearrage
1
2
3
4
56
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
................................................
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.........................................
......................................................................................................................................................................................................................................................................................................................................................................................................
..........................................................
....................................................................................................................................................................................................................................................................
..
...................................................................................................................................................................................................
..................................................................................................................................
137
Twin pipe serial-parallel multipliersXA
B
P
Parallel in – serial outshift registers
Parallel in – serial outshift registers
32-bit twin pipe serial-parallel multiplier unit
B31 B29 …… B3 B1
B30 B28 …… B2 B0
Load/ Shi f tResetCl ock
Block diagram of 32*32-bit signed twin pipe serial-parallel multiplier with serial/parallel conversion logic
Serial in – parallel outshift registers
Serial in – parallel outshift registers
P62 P60 ……………………… P2 P0
P63 P61 ……………………… P3 P1
Resul t_ready
A31 A30 …………………… A1 A0
Si gn
138
Signed twin pipe serial-parallel multipliers
XA
B
P
FA
D D
D
FA
FA
DD D
D
FA
A31 A30
A31
Even data bits on rising clock
Odd data bits on rising clock
…... B2 B0 0 0 reset
Clock
Reset
FA
DD
D
D
FA
A0
HA
D
D
HAD
0
MUX
1
Product
Evenproduct
Oddproduct
D
D
falling_edge
rising_edge
Clock
…... B3 B1 0 0 reset
32*32-bit twin pipe serial-parallel multiplier for signed number
Repeat 28 units more
Sign
B31 B29 …... A30 A0
D
“Sign” control line and the sign-change hardware
139
Unsigned twin pipe serial-parallel multipliers
XA
B
P
HA
D D
D
HA
FA
DD D
D
FA
A31 A30
A31 A30
Even data bits on rising clock
Odd data bits on rising clock
…... B2 B0 0 0 reset
Clock
Reset
FA
DD
D
D
FA
A0
A0
HA
D
D
HAD
0
MUX
1
Product
Evenproduct
Oddproduct
D
D
falling_edge
rising_edge
Clock
…... B3 B1 0 0 reset
32*32 bit twin pipe serial-parallel multiplier for unsigned number
Repeat 28 units more
• Don’t need the “Sign” control line and the sign-change hardware