1
Advanced Digital IC Design
Arithmetic
Number Representation
Advanced Digital IC Design
AdditionMultiplicationDivisionDistributed ArithmeticNewton RaphsonNewton RaphsonCORDIC
Unsigned Number Representation
Fixed radix (base) systems
11 2 1 0 1
1 2 1 0 1
{0,1, 2, ... 1}
l ii
i kk k l
k k l
a r r
r a
r a r a r a r a r a r a
×−
= −
− − − −−
−
−
∈
− −
=
= + + +
∑
The digits in a radix system:
1 1 0 1.i i la a a a a a− − −Fractional part
described in a fixed point positional number system:
Example: Unsigned Number
{ }{0,1, 2, ... 9} 10110 a
l ii
i ka ∈
−=∑ in radix
{ }{
11 2 1
1 2 1 0 1
0,1} 2
10 10 10 10 10
2
i kk k l
k k l
l ii a
a a a a a a
a
= −
− − − −− − − −
−∈
= + + +
=∑ in radix 11 2 1
1 2 1 0 12 2 2 2 2
i kk k l
k i la a a a a a
= −
− − − −− − − −= + + +
2
Signed Digit Number Representation
{ , 0, 1 }a r r
l i
α α∈ − − −
−
… …The digits in a radix system:
10
{ 4, 3, 0, 4, 5}
2 1 0(3 1 5) 10 3 10 1 10 5 300 10 5 295
a
ik
i kr a×
× ×
∈ − −
=
− = − + = − + =
∑
… …Example Radix 10:
10
101 2
(3 1 5) 10 3 10 1 10 5 300 10 5 295
(3 . 1 5) 3 10 1 10 5 3 0.1 0.05 2.95× ×− −
+ +
− = − + = − + =
Modified Booth’s recoding - a signed digit radix 4 representation
Two’s Complement
{0,
1
1}
2 2lk i
a
−−
∈
∑
The digits in a radix 2 system:
11
21 2 1 0 1
1 2 1 0 1
2 2
2 2 2 2 2 2
k ik i
i kk k l
k k l
a a
a a a a a a
−= −
− − − −− − − −
− × + × =
= − + + +
∑
described in a fixed point positional number system:
1 2 1 0 1.k k la a a a a a− −− −Fractional part
described in a fixed point positional number system:
Sign Bits
1 21 01 2
1 2
2 2 2
2 2 2 2
k kk k
k k k
a a a a =− −− −
− −
− + +
Sign Extension in Two’s Complement
1 21 01 1 2
1 1 21 01 1 1 2
2 2 2 2
2 2 2 2 2
k k kk k k
k k k kk k k k
a a a a a
a a a a a a
=
+
− − −
+ − −− − − −
− + + +
− + + +
Example:
10010 110010 1110010 11110010
00010 000010 0000010 00000010
= = = =
= = = =
Addition is the most common arithmetic ti i di it l
Addition
operation in digital processors
Also the basis of most other arithmetic operations like
multiplicationdivisiondivisionsquare root…
3
Addition
Ripple Carry Adder (RCA)
A0 B0 A2A1 B2B1
FA FA FA
A3 B3
FACi,0 Co,0 Co,3Co,2Co,1
S0 S2S1 S3
Critical Path through all adder cells
Addition: Sign Extension
A0 B0 A2A1 B2B1
S SS
FA FA FA
S
FA
S
FA
S0 S2S1 S3 S4
Adding More Numbers
Carry Ripple Adders in a Chain
A0 B0 A B
HA FA FA FA
HA FA FA FA
D0
C0
D
C
HA FA FA FA
SS0
Critical Path through 6 adder cells
Adding More Numbers
Carry Ripple Adders in a Tree
A0 B0 A B
HA FA FA FA
HA FA FA FA
DCD0C0
HA FA FA FA
SS0
Critical Path through 5 adder cells
4
Adding More Numbers
Carry Save Adder (CSA)A0 B0 C0
FA FA FA FA
HA FA FA FA
D0
VectorFA FA FA
S0
VectorMergingAdder
Only One Critical Path (through 5 adder cells)
Pipelining
Ripple Carry Adders in a Chain
A0 B0 A B
HA FA FA FA
HA FA FA FA
D0
C0
D
CR RRR
R RRR R
R
Critical Path through 4 adder cellsR = Register
HA FA FA FA
SS0
Ripple Carry Adders in a Tree
Pipelining
A0 B0 A B
HA FA FA FA
HA FA FA FA
DCD0C0
R RR R R R R R RR
Lower Latency than the Chain Adder
R = Register
HA FA FA FA
SS0
Latency
Latency: The number of clock cycles it takes before we se the result
Latency time: Latency * cycle time
HA FA FA FA
A0 B0 A B
DCD0C0
HA FA FA FA
A0 B0
C0
A B
CR RRR R
HA FA FA FA
HA FA FA FA
SS0
R RR R R R R R RR
HA FA FA FA
HA FA FA FA
D0 D
SS0
R RRR R
5
Carry Save Adder (CSA)
Pipelining
A0 B0 C0
FA FA FA FA
HA FA FA FA
D0
RR R R R R R R
RR R R R R R RRegister for both Sum and Carry needed
Critical Path: 1 cell in CSA and 3 in vector merging
FA FA FA
S0
VectorMergingAdder
Carry Save Adder (CSA) with Carry Look Ahead (CLA)
Pipelining for fast addition
A0 B0 C0
FA FA FA FA
HA FA FA FA
D0
CLA
RR R R R R R R
RR R R R R R R
Very Short Critical PathR = Register
CLA
S0
CLAMergingAdder
Vector Merging Adder - CLA
The CLA is done in blocks A common maximum is 4 bits per blockLarger blocks are to complex
Co,2CLA
A0 B0 A1 B1 A2 B2
Co,5CLA
A3 B3 A4 B4 A5 B5
Co,8CLA
A6 B6 A7 B7 A8 B8
C0 P0
S0
C1 P1
S1
C2 P2
S2
C3 P3
S3
C4 P4
S4
C5 P5
S5
C6 P6
S6
C7 P7
S7
C8 P8
S8
Generate & Propagate
A B C i S C o
0 0 0 0 0 Delete0 0 0 0 0 Delete0 0 1 1 0 Delete0 1 0 1 0 Propagate0 1 1 0 1 Propagate1 0 0 1 0 Propagate1 0 1 0 1 Propagate
FA
A B
Ci Co
1 0 1 0 1 Propagate1 1 0 0 1 Generate1 1 1 1 1 Generate
S
6
D A BP A B
=⊕
Delete,Propagate
Generate, Propagate
Functions P A B
G AB
S A B C P C
= ⊕=
= ⊕ ⊕ = ⊕
Propagate,Generate,
of A and B
111
101
110
000
A+BBA
( )( )
oC AB AC BC AB A B CAB AB A B C G PC= + + = + + =
= + + ⊕ = +
Redundant
Carry Look Ahead (CLA)
10i,01011o,01o,1
0i,00o,0
PPCPGGPCGCPCGC
++=+=
+=
3210i,03210321323o,23o,3
210i,02102122o,12o,2
PPPPCPPPGPPGPGGP3CGCPPPCPPGPGGPCGC
++++=+=
+++=+=
Co,0 Co,3Co,2Co,1
210i 02102122o 12o 2
10i,01011o,01o,1
0i,00o,0
PPPCPPGPGGPCGCPPCPGGPCGC
PCGC
+++=+=
++=+=
+=
Carry Look Ahead (CLA): Precharged
3210i,03210321323o,23o,3
210i,02102122o,12o,2
PPPPCPPPGPPGPGGP3CGC ++++=+=
P0
φP3P2P1
φ
G0Ci G1 G2 G3
φ
Carry Look Ahead (CLA)
φ
P0 P3P2P1
G0Ci G1 G2 G3
Alternative structure
φ
P0
φP3P2P1
G0Ci G1 G2 G3
structure
7
,3 3 ,2 3 2 3 1 2 3 0 1 2 3 ,0 0 1 2 33o o iC G C P G G P G P P G PP P C P PP P= + = + + + +
Carry Look Ahead (CLA): Manchester
VDD
Co,3
Ci,0
G0
G1
G2
G3
P0
P1
P2
P3
,0 0 0 ,0o iC G P C= +
Logarithmic Adder
Look Ahead one step Look Ahead
two steps
1:01:0
1:01:0
,1 1 1 ,0 1 0 1 1 0 ,0 1:0 1:0 ,0( )( )
,2 2 2 ,1 2 2 1 0 1 2 1 0 ,0 2 2 1:0 2 1:0 ,0( )
o o i iPropagate PGenerate G
o o i iPG
C G PC G G P PP C G P C
C G P C G P G G P P PP C G P G P P C
= + = + + = +
= + = + + + = + +
2:1
,3 2 3 ,2 3 3 2 1 2(o o
G
C G PC G P G G P= + = + +2:1
3 2 1 ,1 3 3 2:1 3 2:1 ,0) i oP
P P P C G PG P P C+ = + +
Logarithmic Adder, 4 bit
P&GCreation
A0
B
G0 P0
1:0 1 0 1
1:0 1 0
G G G PP PP
= +
= 0 0 ,0iG P C+
0 0 0 0o iC G P C= +CreationB0
P&GCreation
A2
B2
P&GCreation
A1
B1
G1 P1
G2 P2
,1 1:0 1:0 ,0o iC G P C= +
,2 2 2 1:0 2 1:0 ,0o iC G P G P P C= + +
,0 0 0 ,0o i
P&GCreation
A3
B3
G3 P3
= Gi:j Pi:j creation
,3 3 3 2:1 3 2:1 ,0o oC G PG P P C= + +
2:1 2 1 2
2:1 2 1
G G G PP P P
= +=
Logarithmic Adder, 16 bitP&G
Creation
A0
B0
P&GCreation
A3
B
P&GCreation
A2B2
P&GCreation
A1B1
Co 3
Co,2
Co,1
Co,0
One step Look Ahead
Two step Look AheadCreationB3
P&GCreation
A4
B4
P&GCreation
A7
B7
P&GCreation
A6B6
P&GCreation
A5B5
P&GCreation
A8
B8
P&GCreation
A9B9
o,3
Co,7
Co,6
Co,5
Co4
Co,9
Co,8
Four step Look Ahead
An N bit adder is computed in log (N) stages
P&GCreation
A12B12
P&GCreation
A15
B15
P&GCreation
A14B14
P&GCreation
A13B13
P&GCreation
A11
B11
P&GCreation
A10B10
Co,15
Co,14
Co,13
Co,12
Co,11
Co,10
Eight step Look Ahead
log2(N) stages
Kogge-Stone adder
8
Logarithmic Adder, 16 bitP&G
Creation
A0
B0
P&GCreation
A3
B
P&GCreation
A2B2
P&GCreation
A1B1
Co 3
Co,2
Co,1
Co,0
One step Look Ahead
Two step Look AheadCreationB3
P&GCreation
A4
B4
P&GCreation
A7
B7
P&GCreation
A6B6
P&GCreation
A5B5
P&GCreation
A8
B8
P&GCreation
A9B9
o,3
Co,7
Co,6
Co,5
Co4
Co,9
Co,8
Four step Look Ahead
An N bit adder is computed in log (N) stages
P&GCreation
A12B12
P&GCreation
A15
B15
P&GCreation
A14B14
P&GCreation
A13B13
P&GCreation
A11
B11
P&GCreation
A10B10
Co,15
Co,14
Co,13
Co,12
Co,11
Co,10
Eight step Look Ahead
log2(N) stages
Kogge-Stone adder
Other logarithmic adders
Kogge-Stone17 cells
Brent-Kung12 cells17 cells
Fan out 212 cellsLarge fan out
Sklansky adderLarge fanout
Other logarithmic adders Carry Bypass A B C i S C o
0 0 0 0 0 Delete0 0 1 1 0 Delete0 1 0 1 0 Propagate0 1 1 0 1 Propagate1 0 0 1 0 P t
AB
1 0 0 1 0 Propagate1 0 1 0 1 Propagate1 1 0 0 1 Generate1 1 1 1 1 Generate
S
Ci CoFA
o iA B C C≠ =give
Bypass carry if P=1P A B= ⊕Propagate,
o iA B C C= give independent of
9
Carry Bypass
A0B0 A1B1 P0 P1
,1 ,0 0 0 1 1 o iC C A B A B= ≠ ≠if and
FA FACi,0 Co,0 Co,1
,1 ,0 0 1
,1 ,0
o i
o i
C C P PC C
=that is if
otherwise independent of
Bypass carry when P0 P1
Carry Bypass Adder
G1 G3G0 G2P1 P3P0 P2
P1 P3P0 P2
C0 C2
S1 S3S0 S2
1 30 21 30 2
FAFA FA FAC1 C3
Co3
P1 P3P0 P2Bypass if = 1
Otherwise Co3 independent of Co
Carry Bypass Adder
If A = B in at least one adder cell ⇒Co not dependent on Ci
Setup
S9 S11S8 S10
FAFA FA FA
Setup
S13 S15S12 S14
FAFA FA FA
S5 S7S4 S6
FAFA FA FA
Setup
S1 S3S0 S2
FAFA FA FA
Setup
If A ≠ B in all adders ⇒ Bypass Carry
Carry Select
Setup
FAFA FA FA
FAFA FA FA
CC
"0"
"1"
Sum Gen.
S1 S3S0 S2
Co,k+3Ci,k
10
Carry Select: Critical Path
Setup
0
Setup
0
Setup
0
Setup
0FAFA FA FA
FAFA FA FA
Co,3
Ci,0
0
1
FAFA FA FA
FAFA FA FA
Co,7
0
1
FAFA FA FA
FAFA FA FA
Co,11
0
1
FAFA FA FA
FAFA FA FA
0
1
Large area (two adders not needed in first stage)
Sum Gen. Sum Gen. Sum Gen. Sum Gen.
S9 S11S8 S10 S13 S15S12 S14S5 S7S4 S6S1 S3S0 S2
Linear Carry Select
FAFA FA FA
Setup
0 FAFA FA FA
Setup
0 FAFA FA FA
Setup
0 FAFA FA FA
Setup
0FAFA FA FA
FAFA FA FA
Co,3
Ci,0
0
1
FAFA FA FA
FAFA FA FA
Co,7
0
1
FAFA FA FA
FAFA FA FA
Co,11
0
1
FAFA FA FA
FAFA FA FA
0
1
The same number of bits in each stage
Sum Gen. Sum Gen. Sum Gen. Sum Gen.
S9 S11S8 S10 S13 S15S12 S14S5 S7S4 S6S1 S3S0 S2
Square Root Carry Select
0 0
Setup
0
Setup
0
Setup Setup
FAFA
FA
Co,1
Ci,0
0
1
FAFA FA
FAFA FA
Co,4
0
1
FAFA FA FA
FAFA FA FA
Co,8
0
1
FAFA FA FA
FAFA FA FA
0
1
FA
FA FA
Sum Sum Sum Gen. Sum Gen.
S9 S11S8 S10 S13S12S5 S7S4 S6S1 S3S0 S2
Multiplication
The steps involved in multiplicationPartial product generationAccumulate the partial products
The maximum speed is O(log2W)
11
Multipliers
Iterative multipliersOne or a few partial products are processed each clock cycleSmall areaSlow
Hardware mapped multipliersA complete multiplication each clock cycleLarge areaFast
Iterative Multiplication
A simple multiplierApplicable to both Carry Ripple and Carry Savepp y pp y
0
32i
ii
P A B A b=
= × = × × =∑
Unsigned Multiplication
a3 a2 a1 a0
b3 b2 b1 b0
a3 b0 a2 b0 a1 b0 a0 b0
a b a b a b a b
3 2 1 03 2 1 02 2 2 2A b A b A b A b= × + × + × + ×
a3 b1 a2 b1 a1 b1 a0 b1
a3 b2 a2 b2 a1 b2 a0 b2
a3 b3 a2 b3 a1 b3 a0 b3
p6 p5 p4 p3 p2 p1 p0
Shifted partial products
Unsigned Multiplication
3 2 1 03 2 1 02 2 2 2A b A b A b A b= × + × + × + ×
a3 a2 a1 a0
b3 b2 b1 b0
a3 b0 a2 b0 a1 b0 a0 b0
a3 b1 a2 b1 a1 b1 a0 b1
pp31 pp2
1 pp11 pp0
1
a3 b2 a2 b2 a1 b2 a0 b2
Rows in Multipliera3 b2 a2 b2 a1 b2 a0 b2
pp32 pp2
2 pp12 Pp0
2
a3 b3 a2 b3 a1 b3 a0 b3
p6 p5 p4 p3 P2 p1 p0
Multiplier
12
Array Multiplier
xi xiyjyj
Basic cellsFA HACo Ci
S S
Co
x3 x1 x0x2yj
HA FA FA HAPartial Product
Array Multiplier
b1
b0a3 a1 a0a2
a3 a1 a0a2
Bit M lti li tiHA FA FA HA
FA FA FA HA
b2
b3
a3 a1 a0a2
a3 a1 a0a2
ajppj-1i-1
bi
Bit Multiplication
FA FA FA HA
b3
p3 p1 p0p2p5 p4p6
FAcout cin
ppji
Array Multiplier: Critical Paths
HA FA FA HA
FA FA FA HA
FA FA FA HA
Carry Save Multiplier
Only one ycritical path
One extra adder
S it bl f FA FA FA
HA FA FA FA
HA HA HA HA
HA Suitable for CLA
FA FA FA
HA FA FA HA
HA
13
Pipelining
y1
y0x3 x1 x0x2
x3 x1 x0x2
HA FA FA HA
FA FA FA HA
y2x3 x1 x0x2
x x xx
FA FA FA HA
y3x3 x1 x0x2
z3 z1 z0z2z5 z4z6
HA HA HA HA
Pipelining
FA FA FA
HA FA FA FA
HA
HA FA FA HA
Multiplier Floorplan
HA HA HA HA
FA FA FA
HA FA FA FA
HA FA FA FA
HA FA FA HA
HA
A B× =
Two’s Complement (Horner’s Rule)
Solved by sign extension
0 3 2 10 3 2 1 0
1 3 2 11 3 2 1 0
2 3 2 12 3 2 1 0
2 ( 2 2 2 )
2 ( 2 2 2 )
2 ( 2 2 2 )
b a a a a
b a a a a
b a a a a
× − + + + +
× − + + + +
× − + + + +
Need to be rewritten
2 3 2 1 0
3 3 2 13 3 2 1 0
2 ( 2 2 2 )
2 ( 2 2 2 )
b a a a a
b a a a a
× + + + +
− × − + + +
14
3 3 2 13 3 2 1 0
3 3 2 1
2 ( 2 2 2 )b a a a a− × − + + + =
Two’s Complement (Horner’s Rule)
LSB3 3 2 13 3 2 1 0
3 3 2 13 3 2 1 0
3 3 2 1 33 3 2 1 0 3
2 (2 2 2 )
2 ( 2 2 2 1)
2 ( 2 2 2 ) 2
b a a a a
b a a a a
b a a a a b
= × − − − =
= × − + + + + =
= × − + + + +
LSB
Complemented
Multiplication (Horner’s Rule)
a3 a2 a1 a0
b3 b2 b1 b0
-a3 b0 a2 b0 a1 b0 a0 b0
a b a b a b a b 0[ ] 2A b-a3 b1 a2 b1 a1 b1 a0 b1
-a3 b2 a2 b2 a1 b2 a0 b2
-a3 b3 a2 b3 a1 b3 a0 b3
b3
p6 p5 p4 p3 p2 p1 p0
00
11
[ ] 2
[ ] 2
A b
A b
× ×
× ×
3 3 2 1 33 3 2 1 0 32 ( 2 2 2 ) 2b a a a a b× − + + + +
Multiplication (Horner’s Rule)
Negative MSBs solved with sign extension,one in each partial product
N t d if Not used if the result is truncated
Multiplication (Horner’s Rule)
Sign extension, one in each partial product
Note: Carry Ripple
Complement3 0 3 1a b a b+
“LSB one”
13 3 2pp a b+
15
Multiplication (Horner’s Rule)
Using Carry Save and Vector and Vector Merging Adder
CSA Cell
FA
A B C
(3, 2) From stage
Often called:
SC To stagei+1
From stagei-1
Counts the # of ones at the input and compress it to a binary number
Other are e.g. Often called:3-2 compressor(3, 2) counter
Other are e.g. (2, 2), (7, 3) …
Used to form CSA trees
Wallace tree
Bit # First Stage Second Stage
Four 4-bit words to add
HAs(2, 2) Counters
FAs
0Bit #
6 2 135 4 0First Stage
6 2 135 4
First Stage Result Second Stage Result
0Second Stage
6 2 135 4
0First Stage Result6 2 135 4 0
Second Stage Result6 2 135 4
SumCarry
Wallace tree
HA HA
06 2 135 406 2 135 4
FA
HA HA
HAFAFA
CLA
Six adders (12 in CSA)Very high speed!
CLA
06 2 135 4
16
Pipelined Wallace tree
06 2 135 406 2 135 4
FA
HA HA
HAFAFA
CLA
R R R R R R R R R R R RR
CLA
06 2 135 4
Very often combined with Booths modified encoding
64 Bit Wallace Tree Multiplier
Booth´s Modified Algorithm
0 1 2, 1,0 1,2i ix { , } y { , }∈ ∈ − −Recode binary numbers to
Five possible digits in yi – radix 5 ?Overlapping radix 4 method
Five digits require coding by 3 binary bits
Booth´s Modified Algorithm
01
12
2 2 0 1
Example 6
k ik i i
i kX x x x { , }
k
−−
= −= − × + × ∈
=
∑
5 4 3 2 1 0
5 4 3 3 2 1 1 0
1 1
1
Example 6
32 16 8 4 2
16 2 4 2 2
2i i i i
-
k
X x x x x x x
X ( x x x ) ( x x x ) ( x x )
If y x x x
x
+
=
= − + + + + +
= − + + + − + + + − + +
= − + +1 1
4 2 0
02
22
2
16 4 2 1 0 1 2
2 n, i even 4
i i i i -
i
i ii iki k i
If y x x x
X Y y y y y {- ,- , , , }
Y y Y y
+
×= − =
+ +
= = + + ∈
= × ⇒ =∑0
1
i.e. Radix 4)−
∑
17
Booth´s Modified Algorithm
Examples:
i i 1 i i-1y 2x x x+= − + +Examples:
xi+1 xi xi-1 yi
0 0 0 00 0 1 10 1 0 10 1 1 2
X 01 11 01 10 (0) Y 02 01 02 02
X 00 10 01 11 (0) Y 01 02 02 01
= ⇒ =
= ⇒ =
1 0 0 -21 0 1 -11 1 0 -11 1 1 0
X 10 11 10 10 (0) Y 01 00 01 02= ⇒ =
There will always be at least one “0” in each pair
Booth´s Modified Algorithm
0 1 0 1 5 0 1 0 1 5x 0 1 1 1 7 x 2 1 7x 0 1 1 1 7 x 2 -1 7
0 1 0 1 1 x 5 1 1 1 1 1 0 1 1 - 50 1 0 1 2 x 5 + 0 1 0 1 2 x 4 x 5
0 1 0 1 4 x 5 0 0 1 0 0 0 1 1+ 0 0 0 0 0 x 5
0 0 1 0 0 0 1 1
-1 ⇒ two´s complement conversion2 ⇒ shift one step (multiply by two)-2 ⇒ two´s complement conversion + shift
yj-1
Xi+2 XiXi+1
Booth Booth Booth Booth Booth
Booth´s Modified Algorithm
Adder Adder Adder Adderyj+1
yjBoothCoder
Booth
BoothMUX
BoothMUX
BoothMUX
BoothMUX
Booth Booth Booth Booth
1×2×
Adder Adder Adder Adderyj+3
yj+2BoothCoder
BoothMUX
BoothMUX
BoothMUX
BoothMUX
Booth´s Modified Algorithm
Booth Muxes
Booth Coders(one cell)
Adders
18
A0B0 A3A2A1 B3B2B1
Adder/Subtractor
CTRLFA FA FA FA
CTRL B XOR0 0 00 1 11 0 11 1 0
Overflow
Correct sum
3-bit two´scomplement sum
1 32
123
Overflow h
Increase the dynamic rangeL ( dd ll )
1 32changes the sign
Larger area (more adder cells)Scale down
Decreases the dynamic rangeUse saturation logic
Often a good solution
Overflow
0 1 1 30 1 1 3
0 0 1 1 30 0 1 1 3
A0 A4A2A1 A3B0 B4B2B1 B3
C0 C4C2C1 C3
FA FA FA FAFA
FA FA FA FAFA
A0 A4A2A1 A3B0 B4B2B1 B3
C0 C4C2C1 C3
HA FA FA FAFA
0 1 1 31 1 0 -2
0 0 1 1 30 1 1 0 6
Increase the dynamic range
FA FA FA FAFA
D0 D4D2D1 D3
FA FA FA FAFA
S0 S4S2S1 S3
HA FA FA FAFA
D0 D4D2D1 D3
HA FA FA FAFA
S0 S4S2S1 S3
HA
HA
S6S5
HA
Overflow
0 1 1 30 1 1 3
0 0 1 1 20 0 1 1 2
Scale down & l ft
0 1 1 31 1 0 -2
0 0 1 1 21 0 0 4
f(n)x(n) y(n)
scale up afterBetter than overflow
f(n)x(n) y(n)
1β
β
19
Overflow - Saturation
3 bit t ´ Saturation
Correct sum
3-bit two scomplement sum
1 32
123
Saturation
3-bitsaturated sum
23
Overflow change the sign
Correct sum1 32
12
Cout-msbCi b
Cout-msb 0 = NOF
From AdderSaturation Arithmetic
Cin-msb
Cin-msb 1 = POF
Signbit
Saturated Output
Overflow if Cout-msb
differs from Cin-msb
Example: recursive filterLimit Cycles
Two’s Zero Input
Two s Complement Arithmetic
Saturated Arithmetic
Source: Lars Wanhammar, “DSP Integrated circuits”
Fixed Coefficient Multiplication
a3 a2 a1 a0
0 1 0 0 1 0
0 0 0 0
a a a a
a3 a2 a1 a0
0 1 0 0 1 0
a3 a2 a1 a0
a a a aa3 a2 a1 a0
0 0 0 0
0 0 0 0
a3 a2 a1 a0
0 0 0 0
p6 p6 p6 p5 p4 p3 p2 p1 p0
a3 a2 a1 a0
p6 p6 p6 p5 p4 p3 p2 p1 p0
a0 a0a2 a1a2 a1
HA HA HA
a3a3
HA
0
20
Bit-Serial
Serial Addition
Digit-Serial
iaib
1+ia1+ib
is
1+is
2+is
icout
1+icout
iaib is
icout
Δ
b)a) Δ
2+ia2+ib
2+icout
Bit-serial Multiplication
Coefficient ROM
LSB first in
Sign Extension
h0(k)
pi
ai
h1(k) h2(k) h3(k)
i
LSB first out
Fixed Coefficient Multiplication
pi
ai
0 000 11
pi
pi
ai
0
000 11
Saves more than 1/2 of
pi
ai
000 11
than 1/2 of the adders at an average
Example: Coef. from a Hilbert Filter
Bit-Parallela6a7a8a9 a4a5 a0a1a2a3
HAFAFA
HAFAFA
s10s11s12s13s14 s5s6s7s9 s8
Binarypoint s0s1s2s3s4
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
a6a7a8 a4a5 a0a1a2a3
s15
Bit-Serial
FA FA
C = 00001101
21
Signed Digit
A redundant representation where x∈{-1,0,1}
Example:
0 0 0 1 = 0 0 1 -1 = 0 1 -1 -1 ……
A sequence of ones:
0 1 1 1 1 0 = 1 0 0 0 -1 0
16 + 8 + 4 + 2 = 32 - 2
Canonical Signed Digit (CSD)
A sequence of ones can be replaced with:
1 A “-1” at the least significant position of the sequence1. A 1 at the least significant position of the sequence.
2. A “1” at the position to the left of the most significant position of the sequence.
3. Zeros between the “1” and the “–1”
1 1 1 0 1 0 1 11 1 1 0 1 0 1 11 1 1 0 1 1 0 -11 1 1 1 0 -1 0 -10 0 0 -1 0 -1 0 -1
Saves more than 2/3 of the adder cells at an average
Canonical Signed Digit
1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 -1 1 1 1 1 0 1 0 1
ai si diai1 1 1 1 0 -1 0 -1 0 0 0 -1 0 -1 0 -1
0 00-1 -1-1
ibi
ci+1
i
ci
reseta)
i
set
ibi
ci ci+1
b)
pi
ai
0
Signed Digit Representation
Booth’s modified algorithmBooth s modified algorithm
For variable coefficients
Canonical Signed Digit
For fixed coefficients
Optimal
22
Distributed Arithmetic
Often used in summation of inner products
for example Discrete Cosine Transform (DCT)
2 2 2 21 3 3 12 2 2 2
(0) (0)(1) (1)(2) (2)
c c c cX xc c c cX x
X c c c c x
⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− −
= ×⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎢ ⎥⎢ ⎥ ⎢ ⎥
for example Discrete Cosine Transform (DCT)
2 2 2 23 1 1 3
( ) ( )(3) (3)X xc c c c
⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦
Distributed Arithmetic
Sum of inner products
1N −10 0 1 1 2 2
0
Ni i
iY c x c x c x c x
−
== = + +∑
ci are M-bit coefficients and xi are W-bit numbers:
1, 1 , 1
12
W ji i W i W j
jx x x
−−
− − −=
= − + ×∑
numbers:
Distributed Arithmetic
Bits in the word
1 1 1, 1 , 1( 2 )
ix
N N Wj
i i i i W i W jY c x c x x− − −
−− − −= = − + × =∑ ∑ ∑
0 0 1
1 1 1, 1 , 1
0 0 1
1 1 1
2
i i j
N N Wj
i i W i i W ji i j
N W Nj
c x c x
= = =
− − −−
− − −= = =
− − −
⎡ ⎤⎢ ⎥= − + × =⎢ ⎥⎣ ⎦
⎡ ⎤
∑ ∑ ∑
Interchanged summation order
, 1 , 10 1 0
2 ji i W i i W j
i j ic x c x −
− − −= = =
⎡ ⎤= − + × =⎢ ⎥
⎢ ⎥⎣ ⎦∑ ∑ ∑
Same bit weight
Example: Distributed Arithmetic
Traditional summation order
0 0 1 1 2 2Y c x c x c x= + + =
-1 -20 0,2 0 0,1 0 0,0
-1 -21 1,2 1 1,1 1 1,0
- 2 2
- 2 2
c x c x c x
c x c x c x
+ + +
+ + +
-1 -22 2,2 2 2,1 2 2,0- 2 2c x c x c x+ +
Note: ci are M-bit constants and xi,j are single bits
23
Interchanged
0 0 1 1 2 2Y c x c x c x= + + =
Example: Distributed Arithmetic
Interchanged summation
order
1 20 0,2 0 0,1 0 0,02 2c x c x c x− −− + + +
Sign bits
1 21 1,2 1 1,1 1 1,0
1 22 2,2 2 2,1 2 2,0
2 2
2 2
c x c x c x
c x c x c x
− −
− −
− + + +
− + +
Interchanged summation order ( i )
x0,j x1,j x2,j ROM0 0 0 00 0 1
Example: Distributed Arithmetic
(rewritten)
0 0,2 1 1,2 2 2,2( )c x c x c x− + + +
0 0 1 c2
0 1 0 c1
0 1 1 c1+c2
1 0 0 c0
1 0 1 c0+c2
1 1 0 c0+c1
Sign bits
10 0,1 1 1,1 2 2,1
20 0,0 1 1,0 2 2,0
( ) 2
( ) 2
c x c x c x
c x c x c x
−
−
+ + + × +
+ + + ×
1 1 0 c0+c1
1 1 1 c0+c1+c2
Shift Accumulator x0,j x1,j x2,j ROM0 0 0 0
Example: Distributed Arithmetic
0 0 1 c2
0 1 0 c1
0 1 1 c1+c2
1 0 0 c0
1 0 1 c0+c2
x0,j 2N WordROMx2,j
x1,jREG
1 1 0 c0+c1
1 1 1 c0+c1+c2LSB first
x0,j x1,j x2,j ROM Coeff.0 0 0 0.00 00 0 1 0.10 c2
Example: Distributed Arithmetic
0, 0. 11jx = 0 0.00
0 01
c =
0 1 0 0.01 c1
0 1 1 0.11 c1+c2
1 0 0 0.00 c0
1 0 1 0.10 c0+c2
1 1 0 0.01 c0+c1
1,
2,
0
0
.
.01
10j
j
x
x
=
=
1
2
0.01
0.10
c
c
=
=
0 561 12 4
rom roS romu mm = + + =1 1 1 0.11 c0+c1+c2
0.00 010
2 4
0. 0.0 0.0100001= + + =
24
Restoring Division436
- 480 Subtract- 44 0 Negative480 Restore (Add)436 0436
- 240 Shift&Sub
15 x 25
0110110100 43601111 15
240 Shift&Sub196 01 Positive196
- 120 Shift&Sub76 011 Positive76
- 60 Shift&Sub16 0111 Positive1616
- 30 Shift&Sub- 14 01110 Negative
30 Restore (Add)16 0111016
- 15 Shift&Sub1 011101 Positive
Quotient: 011101=29
Reminder: 000001
011011010001111 Subtract1111010100 0 Negative01111 Restore (Add)0110110100 011011010001111 Shift&Sub
Restoring Division
0110110100 43601111 15
01111 Shift&Sub011000100 01 Positive11000100 01111 Shift&Sub01001100 011 Positive1001100 01111 Shift&Sub0010000 0111 Positive01000001000001111 Shift&Sub110010 01110 Negative01111 Restore (Add)010000 01110100000 01111 Shift&Sub000001 011101 Positive
Quotient: 011101=29
Reminder: 000001
Non-restoring Division
436- 480 Subtract- 44 Negative
480 R t (Add)
Restoring:1. Add the denominator 2. Subtract half of it
480 Restore (Add)436
436
436 Shift- 240 Shift&Sub
196
Non-restoring:1. Add half of the denominator
436- 480 Subtract- 44 Negative
240 Shift&Add196
- 44
Non-restoring Division
011011010001111 Subtract1111010100 Negative01111 R t (Add)
Restoring:1. Add the denominator 2. Subtract half of it
01111 Restore (Add)0110110100
0110110100
11011010001111 Shift&Sub011000100
Non-restoring:1. Add half of the denominator
011011010001111 Subtract1111010100 Negative
01111 Shift&Add011000100
111010100
25
Array Divider Non-restoring
1 CTRL
A0B0 A3A2A1 B3B2B1
FA FA FA FA
Selects ADD/SUB after shift
XOR
ADD
Division by Reciprocation
To computezqd
=
compute 1/d
multiply
Particularly efficient when several divisions by d
d
1q zd
= ×
1a b
a bd dc e c edd d
⎡ ⎤⎢ ⎥ ⎡ ⎤
=⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥
⎢ ⎥⎣ ⎦
Newton Raphson Efficient 1/dcomputing
21 1( ) ; ( )f x d f xx
′= − = − 2
22
2
1( ( )) ( )( )( 1) ( ) ( ) ( ) ( )1( ( )) ( )
x x
df x i x ix ix i x i x i x i dx if x i x i
−+ = − = − = + −
′ − 2
2
( )
( 1) 2 ( ) ( )
x i
x i x i dx i+ = −
Convergence Speed up in NR
Convergence is slow in the beginningthe number of bits doubles each iteration
Speedup is possibleuse lookup table to set start value
26
The CORDIC Algorithm
Iterative algorithm for circular rotationsExample: Derive sine, cosine … p ,
No multiplications
CORDICCOordinate Rotation DIgital Computer
Presented by Jack E. Volder 1959
Real Rotation
1
1
cos sin
cos ( tan )
cos sin
i i i i i
i i i i
i i i i i
x x y
x y
y y x
α α
α α
α α
+
+
= − =
= −
= + =1 1,x yFind the x, ycoordinates for a given
l
1 0 0 0 0
0 0
0 0
cos ( tan )
Example:
cos ( tan )
cos
i i i iy x
x x y
x
k x
α α
α α
α
= +
= − =
= × =
= ×
angle
True rotation
α
0 0
1 0 0 0 0
0 0 0
0 0 0
cos ( tan )
cos tan
tan
k x
y y x
x
k x
α α
α α
α
= ×
= + =
= × =
= ×0 0, 1,0x y =
Unit Circle
The rotation angleis restricted to
1 cos sin
cos ( tan )
i i i i ix x y
x y
α α
α α
+ = − =
= =
Real Rotation
i.e. a shift
tan 2 iiα
−= ±
1
cos ( tan )
( )
( )
cos sin
cos
tan
( tan )
2
i i i i
i i i
i i i
i i i i i
i i i i
ii
i
x y
k x y
k x y
y y x
d
y x
α α
α α
α α
α
−
+
= − =
= − =
= − × =
= + =
= +
×
=
t )n( ai i iik y x α= +However, multiplicationwith a constant
CORDIC: Pseudo Rotation
1
1
tan
tan
Example:
i i i i
i i i i
x x y
y y x
α
α
+
+
= −
= +1 1,x y
No 1 0 0 0
1 0 0 0 0
p
tan 1
tan tan
x x y
y y x
α
α α
= − =
= + =
Pseudorotation
Truerotation
α1
However the length 1
1cosi i
R
R Rα+
>
= =Ri
Ri+1
No multiplication
UnitCircle
0 0, 1,0x y =
22
21
cos
1 1 tancos
1 tan
i
ii
i i iR R
α
αα
α+
⎧ ⎫⎪ ⎪= = + =⎨ ⎬⎪ ⎪⎩ ⎭
= +
i
27
CORDIC: Pseudo Rotation
R2
R3
3 3,x y
2 2,x y The Angle α is known
De i e sing
R1
R2
1 1,x y,x y
Derive x, y using three iterations where
0 1 2
45.0 26
0
87 1.6 14. 40 .
α α αα →
=
− − −
− − −
0 0,x yR0
87α =
0α1α
2α
CORDIC: Three Iterations
The vector length R is increasing eachR2
R3
3 3,x y
2 2,x y
0
2 21 0 0
2 22 1 1
1
1 tan 1 tan 45 2 1.41
51 tan 2 1 tan 26 6 1 58
R
R R
R R
α
α
=
= + = + = =
= + = + = =
giteration
R1
R2
1 1,x y,x y
2 1 1
2 23 2 2
1 tan 2 1 tan 26.6 1.582
5 851 tan 1 tan 14.0 1.632 32
R R
R R
α
α
= + = + = =
= + = + = =
0 0,x yR0
87α =
0α1α
2α
CORDIC Derive x3,y3
0 1 2
1
1 1tan 1; tan ; tan2 4
tan
tan
i i i ix x y
y y x
α α α
α
α
+
= = =
= −
+R2
R3
3 3,x y
2 2,x y
1
1 0 0
1 0 0
2 1 1
tan
1 1
1 1
1 12 2
i i i iy y x
x x y
y y x
x x y
α+ = +
= − × =⎧⎪⎨
= + × =⎪⎩
⎧ = − × =⎪⎪⎨
R1
2
1 1,x y,x y
2 1 1
3 2 2
3 2 2
1 32 2
1 14 8
1 134 8
y y x
x x y
y y x
⎨⎪ = + × =⎪⎩
⎧ = − × =⎪⎪⎨⎪ = + × =⎪⎩
0 0,x yR0
87α =
0α1α
2α
0
0 1
30 Pos. Rot.30 45 15 Neg. Rot.
15 26.6 11.6 Pos. Rot.
11 6 14 2 4 N R t
α
α α
α α α
= ⇒
− = − = − ⇒
− − = − + = ⇒
CORDIC The sign determine the rotation direction
3 3, ,x yx yR R
≈0 1 2
0
1
2
11.6 14 2.4 Neg. Rot.
The lengths are constant (precalculated)1
2
52
iRR
R
R
α α α α− − − = − = − ⇒
=
=
=
R1
R3
,x y
3 3R R
1 1,x y
3 3,x y
3
28532
R =
R0
R2
30α =0 0,x y
2 2,x y
0α
1α2α
28
CORDIC Derive x3,y3
0 1
1
1tan 1; tan2
tan
tan
i i i ix x y
y y x
α α
α
α
+
= =
= −
+
Negative Rotation
1
1 0 0
1 0 0
2 1 1
tan
1 1
1 1
1 32 2
i i i iy y x
x x y
y y x
x x y
α+ = +
= − × =⎧⎪⎨
= + × =⎪⎩
⎧ = + × =⎪⎪⎨
R1
R3
,x y
1 1,x y
3 3,x y
2 1 1
3 2 2
3 2 2
1 12 2
1 114 8
1 74 8
y y x
x x y
y y x
⎨⎪ = − × =⎪⎩
⎧ = − × =⎪⎪⎨⎪ = + × =⎪⎩
R0
R2
30α =0 0,x y
2 2,x y
0α
1α2α
1 32( ) ( 0) ( 0)x y = =
CORDIC New start vector(No need for multiplication)
11 ( tan )i i i ii
x x yR
α+ = −Start at
0 0
3 3
3
( ) ( ,0) ( ,0
,
)85
( )
,
,( )x y
xR
y
x y
= =
⇒ ≈
3 3,x y
1
3 2 2
1 ( tan )
32 1 32 11( )85 4 85 8
i i i ii
y y xR
x x y
α+ = +
⎧= × − × = ×⎪
⎪⎨
0 0,x y
3 3, y
,x y
30α =
3 2 232 1 32 7( )85 4 85 8
y y x
⎨⎪
= × + × = ×⎪⎩
Sine and Cosine
1 11 1( tan ); ( tan )i i i i i i i ii i
x x y y y xR R
α α+ += − = +
3 coscos ix α α= ≈∑∑
3 2 2
3
0 1 2
0 1 22 2
32 1 32 11( ) 0.84485 4 85 8
32 1 32
cos( )
sin7( ) 0.53785 4 85 8
( )
x x y
y y x
α α α
α α α
⎧= × − × = × = =⎪
⎪⎨⎪
= × + × = × = − +=
−
⎩
+
⎪
3 s ni is n iy α α= ≈∑
30α =sin
tan tan ; (division needed)cos
ii
i
αα α
α= ≈∑
∑
Basic CORDIC Rotations How to choose the
anglesShifts Angles
Prestored
Vector lengths Riare also
Prestored
0
1
2
tan 1
1tan2
1tan
α
α
α
=
=
=
0
1
2
arctan 1 45
1arctan 26.62
1arctan 14 0
o
o
o
α
α
α
= =
= =
= =2
3
4
tan4
1tan8
1tan16
α
α
α
=
=
2
3
4
arctan 14.04
1arctan 7.18
1arctan 3.616
o
o
α
α
α
= =
= =
= =
29
Basic CORDIC Rotations
112
i i i i ix x d y+ = −
Each CORDIC iteration require
3 ADD/SUB2 Shifts
1
1
2
12
1arctan2
i i i i i
i i i i
y y d x
dα α
+
+
= +
= −2
sign( )i id α=
CORDIC Hardware: Iterative
ADD
SUB
X REG
Each CORDIC iteration require
3 ADD/SUB2 Shifts
ADD
SUBY REG
Shift
Shift
ADD
SUB
REGα
Lookuptable
CORDIC Hardware: Unrolled
ADD SUBADD SUB ADD SUB
x0 y0
Sign bit
α0α−
3 0 1 2cos( )x α α α= + +
3 0 1 2sin( )y α α α= + +
ADD SUBADD SUB ADD SUB
ADD SUBADD SUB ADD SUB
1/2
x1
1/2
y1
Sign bit
0α α−
1α−
ADD SUBADD SUB ADD SUB
1/4
x2
1/4
y2
x3 y3
Sign bit
0 1 2α α α α− − −
0 1α α α− −
2α−
CORDIC Summary
The CORDIC algorithm is used for
Polar/rectangular conversionPolar/rectangular conversionsine, cosine, tangent …arcsine, arcos, arctangent …Hyperbolic functionsDivisionSquare-rootSquare root…No multiplications neededOne bit accuracy per iteration
30
Binary Shifter
Bit-SliceA0 A3A2A1A
Four bit shifter
Right
LeftNOP
Right
LeftNOP
Q3Q2Q2Q0Q
Binary Shifter
A0 A3A2A1 A0 A3A2A1
0 A3 A 0
Right
Left
3 A0 0
0 A0 A2A1 A2A1 A3 0
Logarithmic Shifter
A6 A7A3 A4 A5A2A0 A1
SS1
S1
S2
S2
S4
S4
Example S=101 (Shift 6 bit left)
A6 A7A3 A4 A5A2A0 A1
S1
S
101 will open
S1
S2
S2
S4
S4
A7 A7A7A7A7A7A6A5
Top Related