Download - Advanced Digital IC Design Number Representation Addition ...7 C G C P G G P GPP G PPP C PPPP oo i,3 3 ,2 3 2 3 12 3 0 12 3 ,0 01 2 3=+ =+ + + +3 Carry Look Ahead (CLA): Manchester

1

Advanced Digital IC Design

Arithmetic

Number Representation

Advanced Digital IC Design

AdditionMultiplicationDivisionDistributed ArithmeticNewton RaphsonNewton RaphsonCORDIC

Unsigned Number Representation

Fixed radix (base) systems

11 2 1 0 1

1 2 1 0 1

{0,1, 2, ... 1}

l ii

i kk k l

k k l

a r r

r a

r a r a r a r a r a r a

×−

= −

− − − −−

−

−

∈

− −

=

= + + +

∑

The digits in a radix system:

1 1 0 1.i i la a a a a a− − −Fractional part

described in a fixed point positional number system:

Example: Unsigned Number

{ }{0,1, 2, ... 9} 10110 a

l ii

i ka ∈

−=∑ in radix

{ }{

11 2 1

1 2 1 0 1

0,1} 2

10 10 10 10 10

2

i kk k l

k k l

l ii a

a a a a a a

a

= −

− − − −− − − −

−∈

= + + +

=∑ in radix 11 2 1

1 2 1 0 12 2 2 2 2

i kk k l

k i la a a a a a

= −

− − − −− − − −= + + +

2

Signed Digit Number Representation

{ , 0, 1 }a r r

l i

α α∈ − − −

−

… …The digits in a radix system:

10

{ 4, 3, 0, 4, 5}

2 1 0(3 1 5) 10 3 10 1 10 5 300 10 5 295

a

ik

i kr a×

× ×

∈ − −

=

− = − + = − + =

∑

… …Example Radix 10:

10

101 2

(3 1 5) 10 3 10 1 10 5 300 10 5 295

(3 . 1 5) 3 10 1 10 5 3 0.1 0.05 2.95× ×− −

+ +

− = − + = − + =

Modified Booth’s recoding - a signed digit radix 4 representation

Two’s Complement

{0,

1

1}

2 2lk i

a

−−

∈

∑

The digits in a radix 2 system:

11

21 2 1 0 1

1 2 1 0 1

2 2

2 2 2 2 2 2

k ik i

i kk k l

k k l

a a

a a a a a a

−= −

− − − −− − − −

− × + × =

= − + + +

∑


1 2 1 0 1.k k la a a a a a− −− −Fractional part


Sign Bits

1 21 01 2

1 2

2 2 2

2 2 2 2

k kk k

k k k

a a a a =− −− −

− −

− + +

Sign Extension in Two’s Complement

1 21 01 1 2

1 1 21 01 1 1 2

2 2 2 2

2 2 2 2 2

k k kk k k

k k k kk k k k

a a a a a

a a a a a a

=

+

− − −

+ − −− − − −

− + + +

− + + +

Example:

10010 110010 1110010 11110010

00010 000010 0000010 00000010

= = = =

= = = =

Addition is the most common arithmetic ti i di it l

Addition

operation in digital processors

Also the basis of most other arithmetic operations like

multiplicationdivisiondivisionsquare root…

3

Addition

Ripple Carry Adder (RCA)

A0 B0 A2A1 B2B1

FA FA FA

A3 B3

FACi,0 Co,0 Co,3Co,2Co,1

S0 S2S1 S3

Critical Path through all adder cells

Addition: Sign Extension

A0 B0 A2A1 B2B1

S SS

FA FA FA

S

FA

S

FA

S0 S2S1 S3 S4

Adding More Numbers

Carry Ripple Adders in a Chain

A0 B0 A B

HA FA FA FA

HA FA FA FA

D0

C0

D

C

HA FA FA FA

SS0

Critical Path through 6 adder cells

Adding More Numbers

Carry Ripple Adders in a Tree

A0 B0 A B

HA FA FA FA

HA FA FA FA

DCD0C0

HA FA FA FA

SS0

Critical Path through 5 adder cells

4

Adding More Numbers

Carry Save Adder (CSA)A0 B0 C0

FA FA FA FA

HA FA FA FA

D0

VectorFA FA FA

S0

VectorMergingAdder

Only One Critical Path (through 5 adder cells)

Pipelining

Ripple Carry Adders in a Chain

A0 B0 A B

HA FA FA FA

HA FA FA FA

D0

C0

D

CR RRR

R RRR R

R

Critical Path through 4 adder cellsR = Register

HA FA FA FA

SS0

Ripple Carry Adders in a Tree

Pipelining

A0 B0 A B

HA FA FA FA

HA FA FA FA

DCD0C0

R RR R R R R R RR

Lower Latency than the Chain Adder

R = Register

HA FA FA FA

SS0

Latency

Latency: The number of clock cycles it takes before we se the result

Latency time: Latency * cycle time

HA FA FA FA

A0 B0 A B

DCD0C0

HA FA FA FA

A0 B0

C0

A B

CR RRR R

HA FA FA FA

HA FA FA FA

SS0

R RR R R R R R RR

HA FA FA FA

HA FA FA FA

D0 D

SS0

R RRR R

5

Carry Save Adder (CSA)

Pipelining

A0 B0 C0

FA FA FA FA

HA FA FA FA

D0

RR R R R R R R

RR R R R R R RRegister for both Sum and Carry needed

Critical Path: 1 cell in CSA and 3 in vector merging

FA FA FA

S0

VectorMergingAdder

Carry Save Adder (CSA) with Carry Look Ahead (CLA)

Pipelining for fast addition

A0 B0 C0

FA FA FA FA

HA FA FA FA

D0

CLA

RR R R R R R R

RR R R R R R R

Very Short Critical PathR = Register

CLA

S0

CLAMergingAdder

Vector Merging Adder - CLA

The CLA is done in blocks A common maximum is 4 bits per blockLarger blocks are to complex

Co,2CLA

A0 B0 A1 B1 A2 B2

Co,5CLA

A3 B3 A4 B4 A5 B5

Co,8CLA

A6 B6 A7 B7 A8 B8

C0 P0

S0

C1 P1

S1

C2 P2

S2

C3 P3

S3

C4 P4

S4

C5 P5

S5

C6 P6

S6

C7 P7

S7

C8 P8

S8

Generate & Propagate

A B C i S C o

0 0 0 0 0 Delete0 0 0 0 0 Delete0 0 1 1 0 Delete0 1 0 1 0 Propagate0 1 1 0 1 Propagate1 0 0 1 0 Propagate1 0 1 0 1 Propagate

FA

A B

Ci Co

1 0 1 0 1 Propagate1 1 0 0 1 Generate1 1 1 1 1 Generate

S

6

D A BP A B

=⊕

Delete,Propagate

Generate, Propagate

Functions P A B

G AB

S A B C P C

= ⊕=

= ⊕ ⊕ = ⊕

Propagate,Generate,

of A and B

111

101

110

000

A+BBA

( )( )

oC AB AC BC AB A B CAB AB A B C G PC= + + = + + =

= + + ⊕ = +

Redundant

Carry Look Ahead (CLA)

10i,01011o,01o,1

0i,00o,0

PPCPGGPCGCPCGC

++=+=

+=

3210i,03210321323o,23o,3

210i,02102122o,12o,2

PPPPCPPPGPPGPGGP3CGCPPPCPPGPGGPCGC

++++=+=

+++=+=

Co,0 Co,3Co,2Co,1

210i 02102122o 12o 2

10i,01011o,01o,1

0i,00o,0

PPPCPPGPGGPCGCPPCPGGPCGC

PCGC

+++=+=

++=+=

+=

Carry Look Ahead (CLA): Precharged

3210i,03210321323o,23o,3

210i,02102122o,12o,2

PPPPCPPPGPPGPGGP3CGC ++++=+=

P0

φP3P2P1

φ

G0Ci G1 G2 G3

φ

Carry Look Ahead (CLA)

φ

P0 P3P2P1

G0Ci G1 G2 G3

Alternative structure

φ

P0

φP3P2P1

G0Ci G1 G2 G3

structure

7

,3 3 ,2 3 2 3 1 2 3 0 1 2 3 ,0 0 1 2 33o o iC G C P G G P G P P G PP P C P PP P= + = + + + +

Carry Look Ahead (CLA): Manchester

VDD

Co,3

Ci,0

G0

G1

G2

G3

P0

P1

P2

P3

,0 0 0 ,0o iC G P C= +

Logarithmic Adder

Look Ahead one step Look Ahead

two steps

1:01:0

1:01:0

,1 1 1 ,0 1 0 1 1 0 ,0 1:0 1:0 ,0( )( )

,2 2 2 ,1 2 2 1 0 1 2 1 0 ,0 2 2 1:0 2 1:0 ,0( )

o o i iPropagate PGenerate G

o o i iPG

C G PC G G P PP C G P C

C G P C G P G G P P PP C G P G P P C

= + = + + = +

= + = + + + = + +

2:1

,3 2 3 ,2 3 3 2 1 2(o o

G

C G PC G P G G P= + = + +2:1

3 2 1 ,1 3 3 2:1 3 2:1 ,0) i oP

P P P C G PG P P C+ = + +

Logarithmic Adder, 4 bit

P&GCreation

A0

B

G0 P0

1:0 1 0 1

1:0 1 0

G G G PP PP

= +

= 0 0 ,0iG P C+

0 0 0 0o iC G P C= +CreationB0

P&GCreation

A2

B2

P&GCreation

A1

B1

G1 P1

G2 P2

,1 1:0 1:0 ,0o iC G P C= +

,2 2 2 1:0 2 1:0 ,0o iC G P G P P C= + +

,0 0 0 ,0o i

P&GCreation

A3

B3

G3 P3

= Gi:j Pi:j creation

,3 3 3 2:1 3 2:1 ,0o oC G PG P P C= + +

2:1 2 1 2

2:1 2 1

G G G PP P P

= +=

Logarithmic Adder, 16 bitP&G

Creation

A0

B0

P&GCreation

A3

B

P&GCreation

A2B2

P&GCreation

A1B1

Co 3

Co,2

Co,1

Co,0

One step Look Ahead

Two step Look AheadCreationB3

P&GCreation

A4

B4

P&GCreation

A7

B7

P&GCreation

A6B6

P&GCreation

A5B5

P&GCreation

A8

B8

P&GCreation

A9B9

o,3

Co,7

Co,6

Co,5

Co4

Co,9

Co,8

Four step Look Ahead

An N bit adder is computed in log (N) stages

P&GCreation

A12B12

P&GCreation

A15

B15

P&GCreation

A14B14

P&GCreation

A13B13

P&GCreation

A11

B11

P&GCreation

A10B10

Co,15

Co,14

Co,13

Co,12

Co,11

Co,10

Eight step Look Ahead

log2(N) stages

Kogge-Stone adder

8

Logarithmic Adder, 16 bitP&G

Creation

A0

B0

P&GCreation

A3

B

P&GCreation

A2B2

P&GCreation

A1B1

Co 3

Co,2

Co,1

Co,0

One step Look Ahead

Two step Look AheadCreationB3

P&GCreation

A4

B4

P&GCreation

A7

B7

P&GCreation

A6B6

P&GCreation

A5B5

P&GCreation

A8

B8

P&GCreation

A9B9

o,3

Co,7

Co,6

Co,5

Co4

Co,9

Co,8

Four step Look Ahead

An N bit adder is computed in log (N) stages

P&GCreation

A12B12

P&GCreation

A15

B15

P&GCreation

A14B14

P&GCreation

A13B13

P&GCreation

A11

B11

P&GCreation

A10B10

Co,15

Co,14

Co,13

Co,12

Co,11

Co,10

Eight step Look Ahead

log2(N) stages

Kogge-Stone adder

Other logarithmic adders

Kogge-Stone17 cells

Brent-Kung12 cells17 cells

Fan out 212 cellsLarge fan out

Sklansky adderLarge fanout

Other logarithmic adders Carry Bypass A B C i S C o

0 0 0 0 0 Delete0 0 1 1 0 Delete0 1 0 1 0 Propagate0 1 1 0 1 Propagate1 0 0 1 0 P t

AB

1 0 0 1 0 Propagate1 0 1 0 1 Propagate1 1 0 0 1 Generate1 1 1 1 1 Generate

S

Ci CoFA

o iA B C C≠ =give

Bypass carry if P=1P A B= ⊕Propagate,

o iA B C C= give independent of

9

Carry Bypass

A0B0 A1B1 P0 P1

,1 ,0 0 0 1 1 o iC C A B A B= ≠ ≠if and

FA FACi,0 Co,0 Co,1

,1 ,0 0 1

,1 ,0

o i

o i

C C P PC C

=that is if

otherwise independent of

Bypass carry when P0 P1

Carry Bypass Adder

G1 G3G0 G2P1 P3P0 P2

P1 P3P0 P2

C0 C2

S1 S3S0 S2

1 30 21 30 2

FAFA FA FAC1 C3

Co3

P1 P3P0 P2Bypass if = 1

Otherwise Co3 independent of Co

Carry Bypass Adder

If A = B in at least one adder cell ⇒Co not dependent on Ci

Setup

S9 S11S8 S10

FAFA FA FA

Setup

S13 S15S12 S14

FAFA FA FA

S5 S7S4 S6

FAFA FA FA

Setup

S1 S3S0 S2

FAFA FA FA

Setup

If A ≠ B in all adders ⇒ Bypass Carry

Carry Select

Setup

FAFA FA FA

FAFA FA FA

CC

"0"

"1"

Sum Gen.

S1 S3S0 S2

Co,k+3Ci,k

10

Carry Select: Critical Path

Setup

0

Setup

0

Setup

0

Setup

0FAFA FA FA

FAFA FA FA

Co,3

Ci,0

0

1

FAFA FA FA

FAFA FA FA

Co,7

0

1

FAFA FA FA

FAFA FA FA

Co,11

0

1

FAFA FA FA

FAFA FA FA

0

1

Large area (two adders not needed in first stage)

Sum Gen. Sum Gen. Sum Gen. Sum Gen.

S9 S11S8 S10 S13 S15S12 S14S5 S7S4 S6S1 S3S0 S2

Linear Carry Select

FAFA FA FA

Setup

0 FAFA FA FA

Setup

0 FAFA FA FA

Setup

0 FAFA FA FA

Setup

0FAFA FA FA

FAFA FA FA

Co,3

Ci,0

0

1

FAFA FA FA

FAFA FA FA

Co,7

0

1

FAFA FA FA

FAFA FA FA

Co,11

0

1

FAFA FA FA

FAFA FA FA

0

1

The same number of bits in each stage

Sum Gen. Sum Gen. Sum Gen. Sum Gen.

S9 S11S8 S10 S13 S15S12 S14S5 S7S4 S6S1 S3S0 S2

Square Root Carry Select

0 0

Setup

0

Setup

0

Setup Setup

FAFA

FA

Co,1

Ci,0

0

1

FAFA FA

FAFA FA

Co,4

0

1

FAFA FA FA

FAFA FA FA

Co,8

0

1

FAFA FA FA

FAFA FA FA

0

1

FA

FA FA

Sum Sum Sum Gen. Sum Gen.

S9 S11S8 S10 S13S12S5 S7S4 S6S1 S3S0 S2

Multiplication

The steps involved in multiplicationPartial product generationAccumulate the partial products

The maximum speed is O(log2W)

11

Multipliers

Iterative multipliersOne or a few partial products are processed each clock cycleSmall areaSlow

Hardware mapped multipliersA complete multiplication each clock cycleLarge areaFast

Iterative Multiplication

A simple multiplierApplicable to both Carry Ripple and Carry Savepp y pp y

0

32i

ii

P A B A b=

= × = × × =∑

Unsigned Multiplication

a3 a2 a1 a0

b3 b2 b1 b0

a3 b0 a2 b0 a1 b0 a0 b0

a b a b a b a b

3 2 1 03 2 1 02 2 2 2A b A b A b A b= × + × + × + ×

a3 b1 a2 b1 a1 b1 a0 b1

a3 b2 a2 b2 a1 b2 a0 b2

a3 b3 a2 b3 a1 b3 a0 b3

p6 p5 p4 p3 p2 p1 p0

Shifted partial products

Unsigned Multiplication

3 2 1 03 2 1 02 2 2 2A b A b A b A b= × + × + × + ×

a3 a2 a1 a0

b3 b2 b1 b0

a3 b0 a2 b0 a1 b0 a0 b0

a3 b1 a2 b1 a1 b1 a0 b1

pp31 pp2

1 pp11 pp0

1

a3 b2 a2 b2 a1 b2 a0 b2

Rows in Multipliera3 b2 a2 b2 a1 b2 a0 b2

pp32 pp2

2 pp12 Pp0

2

a3 b3 a2 b3 a1 b3 a0 b3

p6 p5 p4 p3 P2 p1 p0

Multiplier

12

Array Multiplier

xi xiyjyj

Basic cellsFA HACo Ci

S S

Co

x3 x1 x0x2yj

HA FA FA HAPartial Product

Array Multiplier

b1

b0a3 a1 a0a2

a3 a1 a0a2

Bit M lti li tiHA FA FA HA

FA FA FA HA

b2

b3

a3 a1 a0a2

a3 a1 a0a2

ajppj-1i-1

bi

Bit Multiplication

FA FA FA HA

b3

p3 p1 p0p2p5 p4p6

FAcout cin

ppji

Array Multiplier: Critical Paths

HA FA FA HA

FA FA FA HA

FA FA FA HA

Carry Save Multiplier

Only one ycritical path

One extra adder

S it bl f FA FA FA

HA FA FA FA

HA HA HA HA

HA Suitable for CLA

FA FA FA

HA FA FA HA

HA

13

Pipelining

y1

y0x3 x1 x0x2

x3 x1 x0x2

HA FA FA HA

FA FA FA HA

y2x3 x1 x0x2

x x xx

FA FA FA HA

y3x3 x1 x0x2

z3 z1 z0z2z5 z4z6

HA HA HA HA

Pipelining

FA FA FA

HA FA FA FA

HA

HA FA FA HA

Multiplier Floorplan

HA HA HA HA

FA FA FA

HA FA FA FA

HA FA FA FA

HA FA FA HA

HA

A B× =

Two’s Complement (Horner’s Rule)

Solved by sign extension

0 3 2 10 3 2 1 0

1 3 2 11 3 2 1 0

2 3 2 12 3 2 1 0

2 ( 2 2 2 )

2 ( 2 2 2 )

2 ( 2 2 2 )

b a a a a

b a a a a

b a a a a

× − + + + +

× − + + + +

× − + + + +

Need to be rewritten

2 3 2 1 0

3 3 2 13 3 2 1 0

2 ( 2 2 2 )

2 ( 2 2 2 )

b a a a a

b a a a a

× + + + +

− × − + + +

14

3 3 2 13 3 2 1 0

3 3 2 1

2 ( 2 2 2 )b a a a a− × − + + + =

Two’s Complement (Horner’s Rule)

LSB3 3 2 13 3 2 1 0

3 3 2 13 3 2 1 0

3 3 2 1 33 3 2 1 0 3

2 (2 2 2 )

2 ( 2 2 2 1)

2 ( 2 2 2 ) 2

b a a a a

b a a a a

b a a a a b

= × − − − =

= × − + + + + =

= × − + + + +

LSB

Complemented

Multiplication (Horner’s Rule)

a3 a2 a1 a0

b3 b2 b1 b0

-a3 b0 a2 b0 a1 b0 a0 b0

a b a b a b a b 0[ ] 2A b-a3 b1 a2 b1 a1 b1 a0 b1

-a3 b2 a2 b2 a1 b2 a0 b2

-a3 b3 a2 b3 a1 b3 a0 b3

b3

p6 p5 p4 p3 p2 p1 p0

00

11

[ ] 2

[ ] 2

A b

A b

× ×

× ×

3 3 2 1 33 3 2 1 0 32 ( 2 2 2 ) 2b a a a a b× − + + + +


Negative MSBs solved with sign extension,one in each partial product

N t d if Not used if the result is truncated


Sign extension, one in each partial product

Note: Carry Ripple

Complement3 0 3 1a b a b+

“LSB one”

13 3 2pp a b+

15


Using Carry Save and Vector and Vector Merging Adder

CSA Cell

FA

A B C

(3, 2) From stage

Often called:

SC To stagei+1

From stagei-1

Counts the # of ones at the input and compress it to a binary number

Other are e.g. Often called:3-2 compressor(3, 2) counter

Other are e.g. (2, 2), (7, 3) …

Used to form CSA trees

Wallace tree

Bit # First Stage Second Stage

Four 4-bit words to add

HAs(2, 2) Counters

FAs

0Bit #

6 2 135 4 0First Stage

6 2 135 4

First Stage Result Second Stage Result

0Second Stage

6 2 135 4

0First Stage Result6 2 135 4 0

Second Stage Result6 2 135 4

SumCarry

Wallace tree

HA HA

06 2 135 406 2 135 4

FA

HA HA

HAFAFA

CLA

Six adders (12 in CSA)Very high speed!

CLA

06 2 135 4

16

Pipelined Wallace tree

06 2 135 406 2 135 4

FA

HA HA

HAFAFA

CLA

R R R R R R R R R R R RR

CLA

06 2 135 4

Very often combined with Booths modified encoding

64 Bit Wallace Tree Multiplier

Booth´s Modified Algorithm

0 1 2, 1,0 1,2i ix { , } y { , }∈ ∈ − −Recode binary numbers to

Five possible digits in yi – radix 5 ?Overlapping radix 4 method

Five digits require coding by 3 binary bits


01

12

2 2 0 1

Example 6

k ik i i

i kX x x x { , }

k

−−

= −= − × + × ∈

=

∑

5 4 3 2 1 0

5 4 3 3 2 1 1 0

1 1

1

Example 6

32 16 8 4 2

16 2 4 2 2

2i i i i

-

k

X x x x x x x

X ( x x x ) ( x x x ) ( x x )

If y x x x

x

+

=

= − + + + + +

= − + + + − + + + − + +

= − + +1 1

4 2 0

02

22

2

16 4 2 1 0 1 2

2 n, i even 4

i i i i -

i

i ii iki k i

If y x x x

X Y y y y y {- ,- , , , }

Y y Y y

+

×= − =

+ +

= = + + ∈

= × ⇒ =∑0

1

i.e. Radix 4)−

∑

17


Examples:

i i 1 i i-1y 2x x x+= − + +Examples:

xi+1 xi xi-1 yi

0 0 0 00 0 1 10 1 0 10 1 1 2

X 01 11 01 10 (0) Y 02 01 02 02

X 00 10 01 11 (0) Y 01 02 02 01

= ⇒ =

= ⇒ =

1 0 0 -21 0 1 -11 1 0 -11 1 1 0

X 10 11 10 10 (0) Y 01 00 01 02= ⇒ =

There will always be at least one “0” in each pair


0 1 0 1 5 0 1 0 1 5x 0 1 1 1 7 x 2 1 7x 0 1 1 1 7 x 2 -1 7

0 1 0 1 1 x 5 1 1 1 1 1 0 1 1 - 50 1 0 1 2 x 5 + 0 1 0 1 2 x 4 x 5

0 1 0 1 4 x 5 0 0 1 0 0 0 1 1+ 0 0 0 0 0 x 5

0 0 1 0 0 0 1 1

-1 ⇒ two´s complement conversion2 ⇒ shift one step (multiply by two)-2 ⇒ two´s complement conversion + shift

yj-1

Xi+2 XiXi+1

Booth Booth Booth Booth Booth


Adder Adder Adder Adderyj+1

yjBoothCoder

Booth

BoothMUX

BoothMUX

BoothMUX

BoothMUX

Booth Booth Booth Booth

1×2×

Adder Adder Adder Adderyj+3

yj+2BoothCoder

BoothMUX

BoothMUX

BoothMUX

BoothMUX


Booth Muxes

Booth Coders(one cell)

Adders

18

A0B0 A3A2A1 B3B2B1

Adder/Subtractor

CTRLFA FA FA FA

CTRL B XOR0 0 00 1 11 0 11 1 0

Overflow

Correct sum

3-bit two´scomplement sum

1 32

123

Overflow h

Increase the dynamic rangeL ( dd ll )

1 32changes the sign

Larger area (more adder cells)Scale down

Decreases the dynamic rangeUse saturation logic

Often a good solution

Overflow

0 1 1 30 1 1 3

0 0 1 1 30 0 1 1 3

A0 A4A2A1 A3B0 B4B2B1 B3

C0 C4C2C1 C3

FA FA FA FAFA

FA FA FA FAFA

A0 A4A2A1 A3B0 B4B2B1 B3

C0 C4C2C1 C3

HA FA FA FAFA

0 1 1 31 1 0 -2

0 0 1 1 30 1 1 0 6

Increase the dynamic range

FA FA FA FAFA

D0 D4D2D1 D3

FA FA FA FAFA

S0 S4S2S1 S3

HA FA FA FAFA

D0 D4D2D1 D3

HA FA FA FAFA

S0 S4S2S1 S3

HA

HA

S6S5

HA

Overflow

0 1 1 30 1 1 3

0 0 1 1 20 0 1 1 2

Scale down & l ft

0 1 1 31 1 0 -2

0 0 1 1 21 0 0 4

f(n)x(n) y(n)

scale up afterBetter than overflow

f(n)x(n) y(n)

1β

β

19

Overflow - Saturation

3 bit t ´ Saturation

Correct sum

3-bit two scomplement sum

1 32

123

Saturation

3-bitsaturated sum

23

Overflow change the sign

Correct sum1 32

12

Cout-msbCi b

Cout-msb 0 = NOF

From AdderSaturation Arithmetic

Cin-msb

Cin-msb 1 = POF

Signbit

Saturated Output

Overflow if Cout-msb

differs from Cin-msb

Example: recursive filterLimit Cycles

Two’s Zero Input

Two s Complement Arithmetic

Saturated Arithmetic

Source: Lars Wanhammar, “DSP Integrated circuits”

Fixed Coefficient Multiplication

a3 a2 a1 a0

0 1 0 0 1 0

0 0 0 0

a a a a

a3 a2 a1 a0

0 1 0 0 1 0

a3 a2 a1 a0

a a a aa3 a2 a1 a0

0 0 0 0

0 0 0 0

a3 a2 a1 a0

0 0 0 0

p6 p6 p6 p5 p4 p3 p2 p1 p0

a3 a2 a1 a0

p6 p6 p6 p5 p4 p3 p2 p1 p0

a0 a0a2 a1a2 a1

HA HA HA

a3a3

HA

0

20

Bit-Serial

Serial Addition

Digit-Serial

iaib

1+ia1+ib

is

1+is

2+is

icout

1+icout

iaib is

icout

Δ

b)a) Δ

2+ia2+ib

2+icout

Bit-serial Multiplication

Coefficient ROM

LSB first in

Sign Extension

h0(k)

pi

ai

h1(k) h2(k) h3(k)

i

LSB first out

Fixed Coefficient Multiplication

pi

ai

0 000 11

pi

pi

ai

0

000 11

Saves more than 1/2 of

pi

ai

000 11

than 1/2 of the adders at an average

Example: Coef. from a Hilbert Filter

Bit-Parallela6a7a8a9 a4a5 a0a1a2a3

HAFAFA

HAFAFA

s10s11s12s13s14 s5s6s7s9 s8

Binarypoint s0s1s2s3s4

FA

FA

FA

FA

FA

FA

FA

FA

FA

FA

FA

FA

a6a7a8 a4a5 a0a1a2a3

s15

Bit-Serial

FA FA

C = 00001101

21

Signed Digit

A redundant representation where x∈{-1,0,1}

Example:

0 0 0 1 = 0 0 1 -1 = 0 1 -1 -1 ……

A sequence of ones:

0 1 1 1 1 0 = 1 0 0 0 -1 0

16 + 8 + 4 + 2 = 32 - 2

Canonical Signed Digit (CSD)

A sequence of ones can be replaced with:

1 A “-1” at the least significant position of the sequence1. A 1 at the least significant position of the sequence.

2. A “1” at the position to the left of the most significant position of the sequence.

3. Zeros between the “1” and the “–1”

1 1 1 0 1 0 1 11 1 1 0 1 0 1 11 1 1 0 1 1 0 -11 1 1 1 0 -1 0 -10 0 0 -1 0 -1 0 -1

Saves more than 2/3 of the adder cells at an average

Canonical Signed Digit

1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 -1 1 1 1 1 0 1 0 1

ai si diai1 1 1 1 0 -1 0 -1 0 0 0 -1 0 -1 0 -1

0 00-1 -1-1

ibi

ci+1

i

ci

reseta)

i

set

ibi

ci ci+1

b)

pi

ai

0

Signed Digit Representation

Booth’s modified algorithmBooth s modified algorithm

For variable coefficients

Canonical Signed Digit

For fixed coefficients

Optimal

22

Distributed Arithmetic

Often used in summation of inner products

for example Discrete Cosine Transform (DCT)

2 2 2 21 3 3 12 2 2 2

(0) (0)(1) (1)(2) (2)

c c c cX xc c c cX x

X c c c c x

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− −

= ×⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎢ ⎥⎢ ⎥ ⎢ ⎥

for example Discrete Cosine Transform (DCT)

2 2 2 23 1 1 3

( ) ( )(3) (3)X xc c c c

⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦


Sum of inner products

1N −10 0 1 1 2 2

0

Ni i

iY c x c x c x c x

−

== = + +∑

ci are M-bit coefficients and xi are W-bit numbers:

1, 1 , 1

12

W ji i W i W j

jx x x

−−

− − −=

= − + ×∑

numbers:


Bits in the word

1 1 1, 1 , 1( 2 )

ix

N N Wj

i i i i W i W jY c x c x x− − −

−− − −= = − + × =∑ ∑ ∑

0 0 1

1 1 1, 1 , 1

0 0 1

1 1 1

2

i i j

N N Wj

i i W i i W ji i j

N W Nj

c x c x

= = =

− − −−

− − −= = =

− − −

⎡ ⎤⎢ ⎥= − + × =⎢ ⎥⎣ ⎦

⎡ ⎤

∑ ∑ ∑

Interchanged summation order

, 1 , 10 1 0

2 ji i W i i W j

i j ic x c x −

− − −= = =

⎡ ⎤= − + × =⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑ ∑

Same bit weight

Example: Distributed Arithmetic

Traditional summation order

0 0 1 1 2 2Y c x c x c x= + + =

-1 -20 0,2 0 0,1 0 0,0

-1 -21 1,2 1 1,1 1 1,0

- 2 2

- 2 2

c x c x c x

c x c x c x

+ + +

+ + +

-1 -22 2,2 2 2,1 2 2,0- 2 2c x c x c x+ +

Note: ci are M-bit constants and xi,j are single bits

23

Interchanged

0 0 1 1 2 2Y c x c x c x= + + =


Interchanged summation

order

1 20 0,2 0 0,1 0 0,02 2c x c x c x− −− + + +

Sign bits

1 21 1,2 1 1,1 1 1,0

1 22 2,2 2 2,1 2 2,0

2 2

2 2

c x c x c x

c x c x c x

− −

− −

− + + +

− + +

Interchanged summation order ( i )

x0,j x1,j x2,j ROM0 0 0 00 0 1


(rewritten)

0 0,2 1 1,2 2 2,2( )c x c x c x− + + +

0 0 1 c2

0 1 0 c1

0 1 1 c1+c2

1 0 0 c0

1 0 1 c0+c2

1 1 0 c0+c1

Sign bits

10 0,1 1 1,1 2 2,1

20 0,0 1 1,0 2 2,0

( ) 2

( ) 2

c x c x c x

c x c x c x

−

−

+ + + × +

+ + + ×

1 1 0 c0+c1

1 1 1 c0+c1+c2

Shift Accumulator x0,j x1,j x2,j ROM0 0 0 0


0 0 1 c2

0 1 0 c1

0 1 1 c1+c2

1 0 0 c0

1 0 1 c0+c2

x0,j 2N WordROMx2,j

x1,jREG

1 1 0 c0+c1

1 1 1 c0+c1+c2LSB first

x0,j x1,j x2,j ROM Coeff.0 0 0 0.00 00 0 1 0.10 c2


0, 0. 11jx = 0 0.00

0 01

c =

0 1 0 0.01 c1

0 1 1 0.11 c1+c2

1 0 0 0.00 c0

1 0 1 0.10 c0+c2

1 1 0 0.01 c0+c1

1,

2,

0

0

.

.01

10j

j

x

x

=

=

1

2

0.01

0.10

c

c

=

=

0 561 12 4

rom roS romu mm = + + =1 1 1 0.11 c0+c1+c2

0.00 010

2 4

0. 0.0 0.0100001= + + =

24

Restoring Division436

- 480 Subtract- 44 0 Negative480 Restore (Add)436 0436

- 240 Shift&Sub

15 x 25

0110110100 43601111 15

240 Shift&Sub196 01 Positive196

- 120 Shift&Sub76 011 Positive76

- 60 Shift&Sub16 0111 Positive1616

- 30 Shift&Sub- 14 01110 Negative

30 Restore (Add)16 0111016

- 15 Shift&Sub1 011101 Positive

Quotient: 011101=29

Reminder: 000001

011011010001111 Subtract1111010100 0 Negative01111 Restore (Add)0110110100 011011010001111 Shift&Sub

Restoring Division

0110110100 43601111 15

01111 Shift&Sub011000100 01 Positive11000100 01111 Shift&Sub01001100 011 Positive1001100 01111 Shift&Sub0010000 0111 Positive01000001000001111 Shift&Sub110010 01110 Negative01111 Restore (Add)010000 01110100000 01111 Shift&Sub000001 011101 Positive

Quotient: 011101=29

Reminder: 000001

Non-restoring Division

436- 480 Subtract- 44 Negative

480 R t (Add)

Restoring:1. Add the denominator 2. Subtract half of it

480 Restore (Add)436

436

436 Shift- 240 Shift&Sub

196

Non-restoring:1. Add half of the denominator

436- 480 Subtract- 44 Negative

240 Shift&Add196

- 44

Non-restoring Division

011011010001111 Subtract1111010100 Negative01111 R t (Add)

Restoring:1. Add the denominator 2. Subtract half of it

01111 Restore (Add)0110110100

0110110100

11011010001111 Shift&Sub011000100

Non-restoring:1. Add half of the denominator

011011010001111 Subtract1111010100 Negative

01111 Shift&Add011000100

111010100

25

Array Divider Non-restoring

1 CTRL

A0B0 A3A2A1 B3B2B1

FA FA FA FA

Selects ADD/SUB after shift

XOR

ADD

Division by Reciprocation

To computezqd

=

compute 1/d

multiply

Particularly efficient when several divisions by d

d

1q zd

= ×

1a b

a bd dc e c edd d

⎡ ⎤⎢ ⎥ ⎡ ⎤

=⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥

⎢ ⎥⎣ ⎦

Newton Raphson Efficient 1/dcomputing

21 1( ) ; ( )f x d f xx

′= − = − 2

22

2

1( ( )) ( )( )( 1) ( ) ( ) ( ) ( )1( ( )) ( )

x x

df x i x ix ix i x i x i x i dx if x i x i

−+ = − = − = + −

′ − 2

2

( )

( 1) 2 ( ) ( )

x i

x i x i dx i+ = −

Convergence Speed up in NR

Convergence is slow in the beginningthe number of bits doubles each iteration

Speedup is possibleuse lookup table to set start value

26

The CORDIC Algorithm

Iterative algorithm for circular rotationsExample: Derive sine, cosine … p ,

No multiplications

CORDICCOordinate Rotation DIgital Computer

Presented by Jack E. Volder 1959

Real Rotation

1

1

cos sin

cos ( tan )

cos sin

i i i i i

i i i i

i i i i i

x x y

x y

y y x

α α

α α

α α

+

+

= − =

= −

= + =1 1,x yFind the x, ycoordinates for a given

l

1 0 0 0 0

0 0

0 0

cos ( tan )

Example:

cos ( tan )

cos

i i i iy x

x x y

x

k x

α α

α α

α

= +

= − =

= × =

= ×

angle

True rotation

α

0 0

1 0 0 0 0

0 0 0

0 0 0

cos ( tan )

cos tan

tan

k x

y y x

x

k x

α α

α α

α

= ×

= + =

= × =

= ×0 0, 1,0x y =

Unit Circle

The rotation angleis restricted to

1 cos sin

cos ( tan )

i i i i ix x y

x y

α α

α α

+ = − =

= =

Real Rotation

i.e. a shift

tan 2 iiα

−= ±

1

cos ( tan )

( )

( )

cos sin

cos

tan

( tan )

2

i i i i

i i i

i i i

i i i i i

i i i i

ii

i

x y

k x y

k x y

y y x

d

y x

α α

α α

α α

α

−

+

= − =

= − =

= − × =

= + =

= +

×

=

t )n( ai i iik y x α= +However, multiplicationwith a constant

CORDIC: Pseudo Rotation

1

1

tan

tan

Example:

i i i i

i i i i

x x y

y y x

α

α

+

+

= −

= +1 1,x y

No 1 0 0 0

1 0 0 0 0

p

tan 1

tan tan

x x y

y y x

α

α α

= − =

= + =

Pseudorotation

Truerotation

α1

However the length 1

1cosi i

R

R Rα+

>

= =Ri

Ri+1

No multiplication

UnitCircle

0 0, 1,0x y =

22

21

cos

1 1 tancos

1 tan

i

ii

i i iR R

α

αα

α+

⎧ ⎫⎪ ⎪= = + =⎨ ⎬⎪ ⎪⎩ ⎭

= +

i

27

CORDIC: Pseudo Rotation

R2

R3

3 3,x y

2 2,x y The Angle α is known

De i e sing

R1

R2

1 1,x y,x y

Derive x, y using three iterations where

0 1 2

45.0 26

0

87 1.6 14. 40 .

α α αα →

=

− − −

− − −

0 0,x yR0

87α =

0α1α

2α

CORDIC: Three Iterations

The vector length R is increasing eachR2

R3

3 3,x y

2 2,x y

0

2 21 0 0

2 22 1 1

1

1 tan 1 tan 45 2 1.41

51 tan 2 1 tan 26 6 1 58

R

R R

R R

α

α

=

= + = + = =

= + = + = =

giteration

R1

R2

1 1,x y,x y

2 1 1

2 23 2 2

1 tan 2 1 tan 26.6 1.582

5 851 tan 1 tan 14.0 1.632 32

R R

R R

α

α

= + = + = =

= + = + = =

0 0,x yR0

87α =

0α1α

2α

CORDIC Derive x3,y3

0 1 2

1

1 1tan 1; tan ; tan2 4

tan

tan

i i i ix x y

y y x

α α α

α

α

+

= = =

= −

+R2

R3

3 3,x y

2 2,x y

1

1 0 0

1 0 0

2 1 1

tan

1 1

1 1

1 12 2

i i i iy y x

x x y

y y x

x x y

α+ = +

= − × =⎧⎪⎨

= + × =⎪⎩

⎧ = − × =⎪⎪⎨

R1

2

1 1,x y,x y

2 1 1

3 2 2

3 2 2

1 32 2

1 14 8

1 134 8

y y x

x x y

y y x

⎨⎪ = + × =⎪⎩

⎧ = − × =⎪⎪⎨⎪ = + × =⎪⎩

0 0,x yR0

87α =

0α1α

2α

0

0 1

30 Pos. Rot.30 45 15 Neg. Rot.

15 26.6 11.6 Pos. Rot.

11 6 14 2 4 N R t

α

α α

α α α

= ⇒

− = − = − ⇒

− − = − + = ⇒

CORDIC The sign determine the rotation direction

3 3, ,x yx yR R

≈0 1 2

0

1

2

11.6 14 2.4 Neg. Rot.

The lengths are constant (precalculated)1

2

52

iRR

R

R

α α α α− − − = − = − ⇒

=

=

=

R1

R3

,x y

3 3R R

1 1,x y

3 3,x y

3

28532

R =

R0

R2

30α =0 0,x y

2 2,x y

0α

1α2α

28

CORDIC Derive x3,y3

0 1

1

1tan 1; tan2

tan

tan

i i i ix x y

y y x

α α

α

α

+

= =

= −

+

Negative Rotation

1

1 0 0

1 0 0

2 1 1

tan

1 1

1 1

1 32 2

i i i iy y x

x x y

y y x

x x y

α+ = +

= − × =⎧⎪⎨

= + × =⎪⎩

⎧ = + × =⎪⎪⎨

R1

R3

,x y

1 1,x y

3 3,x y

2 1 1

3 2 2

3 2 2

1 12 2

1 114 8

1 74 8

y y x

x x y

y y x

⎨⎪ = − × =⎪⎩

⎧ = − × =⎪⎪⎨⎪ = + × =⎪⎩

R0

R2

30α =0 0,x y

2 2,x y

0α

1α2α

1 32( ) ( 0) ( 0)x y = =

CORDIC New start vector(No need for multiplication)

11 ( tan )i i i ii

x x yR

α+ = −Start at

0 0

3 3

3

( ) ( ,0) ( ,0

,

)85

( )

,

,( )x y

xR

y

x y

= =

⇒ ≈

3 3,x y

1

3 2 2

1 ( tan )

32 1 32 11( )85 4 85 8

i i i ii

y y xR

x x y

α+ = +

⎧= × − × = ×⎪

⎪⎨

0 0,x y

3 3, y

,x y

30α =

3 2 232 1 32 7( )85 4 85 8

y y x

⎨⎪

= × + × = ×⎪⎩

Sine and Cosine

1 11 1( tan ); ( tan )i i i i i i i ii i

x x y y y xR R

α α+ += − = +

3 coscos ix α α= ≈∑∑

3 2 2

3

0 1 2

0 1 22 2

32 1 32 11( ) 0.84485 4 85 8

32 1 32

cos( )

sin7( ) 0.53785 4 85 8

( )

x x y

y y x

α α α

α α α

⎧= × − × = × = =⎪

⎪⎨⎪

= × + × = × = − +=

−

⎩

+

⎪

3 s ni is n iy α α= ≈∑

30α =sin

tan tan ; (division needed)cos

ii

i

αα α

α= ≈∑

∑

Basic CORDIC Rotations How to choose the

anglesShifts Angles

Prestored

Vector lengths Riare also

Prestored

0

1

2

tan 1

1tan2

1tan

α

α

α

=

=

=

0

1

2

arctan 1 45

1arctan 26.62

1arctan 14 0

o

o

o

α

α

α

= =

= =

= =2

3

4

tan4

1tan8

1tan16

α

α

α

=

=

2

3

4

arctan 14.04

1arctan 7.18

1arctan 3.616

o

o

α

α

α

= =

= =

= =

29

Basic CORDIC Rotations

112

i i i i ix x d y+ = −

Each CORDIC iteration require

3 ADD/SUB2 Shifts

1

1

2

12

1arctan2

i i i i i

i i i i

y y d x

dα α

+

+

= +

= −2

sign( )i id α=

CORDIC Hardware: Iterative

ADD

SUB

X REG

Each CORDIC iteration require

3 ADD/SUB2 Shifts

ADD

SUBY REG

Shift

Shift

ADD

SUB

REGα

Lookuptable

CORDIC Hardware: Unrolled

ADD SUBADD SUB ADD SUB

x0 y0

Sign bit

α0α−

3 0 1 2cos( )x α α α= + +

3 0 1 2sin( )y α α α= + +



1/2

x1

1/2

y1

Sign bit

0α α−

1α−


1/4

x2

1/4

y2

x3 y3

Sign bit

0 1 2α α α α− − −

0 1α α α− −

2α−

CORDIC Summary

The CORDIC algorithm is used for

Polar/rectangular conversionPolar/rectangular conversionsine, cosine, tangent …arcsine, arcos, arctangent …Hyperbolic functionsDivisionSquare-rootSquare root…No multiplications neededOne bit accuracy per iteration

30

Binary Shifter

Bit-SliceA0 A3A2A1A

Four bit shifter

Right

LeftNOP

Right

LeftNOP

Q3Q2Q2Q0Q

Binary Shifter

A0 A3A2A1 A0 A3A2A1

0 A3 A 0

Right

Left

3 A0 0

0 A0 A2A1 A2A1 A3 0

Logarithmic Shifter

A6 A7A3 A4 A5A2A0 A1

SS1

S1

S2

S2

S4

S4

Example S=101 (Shift 6 bit left)

A6 A7A3 A4 A5A2A0 A1

S1

S

101 will open

S1

S2

S2

S4

S4

A7 A7A7A7A7A7A6A5