ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead...
Transcript of ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead...
![Page 1: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/1.jpg)
ECE 341
Lecture # 6
Instructor: Zeshan Chishti
October 15, 2014
Portland State University
![Page 2: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/2.jpg)
Lecture Topics
• Design of Fast Adders – Carry Looakahead Adders (CLA)
– Blocked Carry-Lookahead Adders
• Multiplication of Unsigned Numbers – Array Multiplier
– Sequential Circuit Multiplier
• Reference: – Chapter 9: Sections 9.2 and 9.3
![Page 3: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/3.jpg)
CLA Delay Calculation
Consider the expression:
ci+1 = Gi + PiGi-1 + PiPi-1Gi-2 + …. ………+ PiPi-1….P1G0 + PiPi-1….P0c0
• All the Gi and Pi functions can be obtained in parallel in one gate delay • AND terms in each ci+1 calculation require one additional gate delay • ORing the AND terms in each ci+1 calculation requires one additional gate delay Therefore, • Total delay in calculating carry outputs = 1 + 1 + 1 = 3 gate delays
Sum outputs require one additional XOR delay after carries are computed • Total delay in calculating sum outputs = 3 + 1 = 4 gate delays
n-bit CLA requires 4 gate delays independent of n
![Page 4: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/4.jpg)
CLA Fan-in Limitation
• Performing n-bit CLA in 4 gate delays, independent of n, good only in theory • In practice, CLA is limited by fan-in constraints
ci+1 = Gi + PiGi-1 + PiPi-1Gi-2 + …. ………+ PiPi-1….P1G0 + PiPi-1….P0c0
• OR gate & last AND gate in the expression for ci+1 require i+2 inputs, each • For a 4-bit CLA, the MSB carry-out (c4) requires a fan-in of 5 • 5 is the practical fan-in limit for most gates • In order to add operands larger than 4-bits, we can cascade multiple CLAs • Cascade of CLAs is called Blocked Carry-Lookahead adder
![Page 5: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/5.jpg)
Blocked Carry-Looakhead Adder
c0 4-bit
CLA
c4 c8 ……. c32 c28 4-bit
CLA 4-bit
CLA
s0 s1 s2 s3 s4 s5 s6 s7 s28 s29 s30 s31
y3-0 x3-0 y7-4 x7-4 y31-28 x31-28
After input operands (X, Y and c0) are applied to the 32-bit adder: • All the Pi and Gi terms in each CLA calculated in parallel in 1 gate delay • c4 available after 3 gate delays • c8 available 2 gate delays after c4 = 3 + (1*2) = 5 gate delays • c12 available (2*2) gate delays after c4 = 3 + (2*2) = 7 gate delays • c16 available after (3*2) gate delays after c4 = 3 + (3*2) = 9 gate delays • c32 available after (7*2) gate delays after c4 = 3 + (7*2) = 17 gate delays • s28, s29, s30, s31 available after 17+1 = 18 gate delays
Carry-outs ripple from one CLA block to the next. Can we avoid this rippling?
32-bit Blocked CLA composed of eight 4-bit
CLA blocks
![Page 6: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/6.jpg)
Faster Blocked Carry-Lookahead adder
Key Idea: Generate the carry outputs c4, c8, c12, … of CLA blocks in parallel, similar to how c1, c2, c3, c4 are generated in parallel within a CLA block
• Carry-out from a 4-bit block can be given as: c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0
• This can be re-written as: c4 = G0
1 + P01c0
where G0
1 = G3 + P3G2 + P3P2G1 + P3P2P1G0 and P01 = P3P2P1P0
We can similarly compute G11, G2
1, G31 …..
Gi1 & Pi
1 are first-level generate & propagate functions, where i denotes the CLA block • Gk
1 = 1 implies that the kth CLA block generates a carry • Pk
1 = 1 implies that the kth CLA block propagates a carry
Carry-out of kth block = Gk1 + Pk
1Gk-11 + Pk
1Pk-11Gk-2
1 + ….. + Pk1Pk-1
1….P11G0
1 + Pk1Pk-1
1…P01c0
• For example:
c16 = G31 + P3
1G21 + P3
1P21G1
1 + P31P2
1P11G0
1 + P31P2
1P11P0
1c0
![Page 7: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/7.jpg)
Blocked CLA with First-level Propagates and Generates
Carry-lookahead logic
4-bit adder 4-bit adder 4-bit adder 4-bit adder
s 15-12
P 3 I G 3
I
c 12
P 2 I G 2
I
c 8
s 11-8
G 1 I
c 4
P 1 I
s 7-4
G 0 I
c 0
P 0
I
s 3-0
c 16
x 15-12
y 15-12
x 11-8
y 11-8
x 7-4
y 7-4
x 3-0
y 3-0
After input operands (X, Y and c0) are applied to the above 16-bit adder: • Pi and Gi terms within each CLA calculated in parallel in 1 gate delay • First-level generates (Gk
1) available after 1 + 2 = 3 gate delays • Carry-outs of CLA blocks (c4, c8, c12, c16) available after 3 + 2 = 5 gate delays • Carries within CLA blocks (such as c15) available after 5 + 2 = 7 gate delays • Sum outputs (such as s15) available after 7 + 1 = 8 gate delays • Compare this with the blocked CLA formed by cascading, where c15 and s15 required 9 and 10 gate delays respectively
![Page 8: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/8.jpg)
Multiplication
![Page 9: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/9.jpg)
Multiplication of Unsigned Numbers
Product of two n-bit numbers is at most a 2n-bit number
Unsigned multiplication can be viewed as addition of shifted versions of the multiplicand.
![Page 10: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/10.jpg)
Multiplication of Unsigned Numbers (contd.)
• In hand multiplication, we add the shifted versions of multiplicand at the end (column-by-column)
– Alternative would be to accumulate partial products at each stage (row-by-row)
Multiplication logic for two n-bit numbers can be implement as follows :
Initialize the partial product PP0 to a value of 0
Start from the LSB of multiplier and proceed towards MSB, one bit at a time. For each bit position of the multiplier, perform the following step:
If the ith bit of the multiplier is 1, shift the multiplicand by i bit positions and add it to PPi in order to obtain PP(i+1)
After n steps, the partial product PPn represents the final product
![Page 11: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/11.jpg)
Multiplication of Unsigned Numbers
Typical multiplication cell
carry in carry out
jth multiplicand bit
ith multiplier bit
Bit of incoming
partial product (PPi)
Bit of outgoing partial product (PP(i+1))
FA
![Page 12: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/12.jpg)
Combanitorial Array Multiplier
Multiplicand
m 3
m 2
m 1
m 0
0 0 0 0
q 3
q 2
q 1
q 0
0
p 2
p 1
p 0
0
0
0
p 3
p 4
p 5
p 6
p 7
PP1
PP2
PP3
(PP0)
, Product = P7,P6,..P0
Multiplicand is shifted by displacing it through an array of adders
Each box represents a multiplication cell
![Page 13: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/13.jpg)
Combinatorial Array Multiplier (cont.)
• Array multipliers are highly inefficient: – Need n n-bit adders => number of gate counts is proportional to n2
– Impractical for large numbers such as 32-bit or 64-bit numbers typically used in computers
– Perform only one function, namely, unsigned integer product
• Solution: Improve gate efficiency by using a mixture of combinatorial array techniques and sequential techniques – Instead of n n-bit adders, use one n-bit adder
– Use a register to hold the accumulated partial product
– This is called a sequential multiplier
![Page 14: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/14.jpg)
Sequential Multiplication
• Recall the rule for generating partial products: – If the ith bit of the multiplier is 1, add the appropriately shifted
multiplicand to the current partial product.
– Multiplicand is shifted left when being added to the partial product
Key Observation:
• Adding a left-shifted multiplicand to an unshifted partial product is equivalent to adding an unshifted multiplicand to a right-shifted partial product
![Page 15: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/15.jpg)
Sequential Circuit Multiplier
q n 1 -
m n 1 -
n-bit Adder
Multiplicand M
Control sequencer
Multiplier Q
0
C
Shift right
Register A (initially 0)
Add/Noadd control
a n 1 -
a 0
q 0
m 0
0
MUX
![Page 16: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/16.jpg)
Sequential Multiplication Algorithm
• Initialization: – Load multiplicand in “M” register, multiplier in “Q” register
– Initialize “C” and “A” registers to all zeroes
• Repeat the following steps “n” times, where “n” is the number of bits in the multiplier – If (LSB of Q register == 1)
A = A + M (carry-out goes to “C” register)
– Treat the C, A and Q registers as one contiguous register and shift that register’s contents right by one bit position
• After the completion of “n” steps – Register “A” contains high-order half of product
– Register “Q” contains low-order half of product
![Page 17: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/17.jpg)
Sequential Multiplication Example
Initial configuration
M
1 1 0 1
C
0 0 0 0 0 1 0 1 1
Q A
![Page 18: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/18.jpg)
Sequential Multiplication Example
1 0 1 1
1 1 0 1
Initial configuration
A += M
M
1 1 0 1
C
First cycle Shift
0
0
0
0 0 0 0
0 1 1 0
1 1 0 1
1 0 1 1
Q A
![Page 19: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/19.jpg)
Sequential Multiplication Example
1 0 1 1
1 1 1 0
1 1 0 1
1 1 0 1
Initial configuration
A += M
M
1 1 0 1
C
First cycle
Second cycle
Shift
Shift
A += M
0
0
0
1
0
0 0 0 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
1 0 1 1
Q A
![Page 20: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/20.jpg)
Sequential Multiplication Example
1 0 1 1
1 1 1 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
Initial configuration
A += M
M
1 1 0 1
C
First cycle
Second cycle
Third cycle No add
Shift
Shift
A += M
Shift
0
0
0
1
0
0
0
0 0 0 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
0 1 0 0
1 0 0 1
1 0 1 1
Q A
![Page 21: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,](https://reader034.fdocuments.net/reader034/viewer/2022051804/5ff0b9200364f06605054ddd/html5/thumbnails/21.jpg)
Sequential Multiplication Example
1 1 1 1
1 0 1 1
1 1 1 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
Initial configuration
A += M
M
1 1 0 1
C
First cycle
Second cycle
Third cycle
Fourth cycle
No add
Shift
Shift A += M
Shift
Shift
A += M
1 1 1 1
0
0
0
1
0
0
0
1
0
0 0 0 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
0 1 0 0
0 0 0 1
1 0 0 0
1 0 0 1
1 0 1 1
Q A
Product