Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by...

36
tal Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Advanced VLSI Design Fall 2006 Lecture 18: Adders, Multipliers, & Shifters Yunsi Fei [Adapted from Jan Rabaey et al’s Digital Integrated Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]

Transcript of Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by...

Page 1: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

ECE 300

Advanced VLSI DesignFall 2006

Lecture 18: Adders, Multipliers, & Shifters

Yunsi Fei[Adapted from Jan Rabaey et al’s Digital Integrated

Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]

Page 2: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Review: Binary Adder Landscape

synchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchester carry carry conditional carry carry chain select lookahead sum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(log N)A = O(N log N)

T = O(N), A = O(N)T = O(N)

A = O(N)

Page 3: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Logarithmic Carry Lookahead Adders

Define carry operator € on (G,P) signal pairs

– € is associative, i.e.,

[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

(G’’,P’’) (G’,P’)

(G,P)

where G = G’’ + P’’G’ P = P’’P’

€ €

G’

!G

G’’

P’’

Page 4: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

General Structure Given P and G terms for each bit position, computing all the

carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order – but note that it is not commutative

Measures to consider– number of € cells

– tree cell depth (time)

– tree cell area

– cell fan-in and fan-out

– max wiring length

– wiring congestion

– delay path variation (glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)

Page 5: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Brent-Kung TreeP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

p2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

p9

G10

P10

G11

p11

G12

P12

G13

p13

G14

p14

G15

p15

€€€€€€€

€ € € €

€ € € € € €

€ €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

0

T =

log 2

NT

= lo

g 2N

- 2

A =

2lo

g 2N

A = N/2

Tadd = tsetup + (2log2N-2) t€ + tsum

Page 6: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Kogge-Stone PPF AdderP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

P2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

P9

G10

P10

G11

P11

G12

P12

G13

P13

G14

P14

G15

P15

€€€€€€€

€ € € €

C1C 2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

T =

log 2

N

A =

log 2

N

A = N

€€€€€€€

€ € € € € € € € € €

€ € € € € € € € € €

€ € € € € €

Tadd = tsetup + log2N t€ + tsum

Page 7: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

DEC “alpha” 21064 Adder

64-bit adder, 0.75m technology, 5ns delay

On the 8-bit level: Manchester chain On the 32-bit sub-block: Carry look ahead On the 64-bit block: Carry select

Page 8: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

More Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

VSkA

KS PPA

Page 9: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Serial Adder

May be used in signal-processing arithmetic where fast computation is important but latency is unimportant.

Data format (LSB first):

bit 0bit 1bit 2bit 3...

Page 10: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Serial adder Structure

LSB control signal clears the carry shift register:

Page 11: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Multiply Operation

Multiplication as repeated additions

multiplicand

multiplier

partialproductarray

double precision product

N

2N

N can be formed in parallel

Page 12: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Shift & Add Multiplication

Right shift and add– Partial product array rows are accumulated from top to bottom on an N-bit

adder

– After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add

– Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA

Making it faster– Use a faster adder– Use higher radix (e.g., base 4) multiplication

» Use multiplier recoding to simplify multiple formation

– Form partial product array in parallel and add it in parallel Making it smaller (i.e., slower)

– Use an array multiplier» Very regular structure with only short wires to nearest neighbor cells. Thus, very

simple and efficient layout in VLSI

» Can be easily and efficiently pipelined

Page 13: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Tree Multiplier Structure

partial productarray reduction tree

fast carry propagate adder (CPA)

P (product)

mux + reductiontree (log N)+CPA (log N)

Q (‘ier)

D (‘icand)

DD

D

0

00

0

multiple forming circuits

Page 14: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

(4,2) Counter Built out of two (3,2) counters (just FA’s!)

– all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)

– the internal output is carried to the next higher weight position (indicated by the )

(3,2)

(3,2)Note: Two carry outs - one “internal” and one “external”

Page 15: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Tiling (4,2) Counters

Reduces column height from four to two– Tiles with neighboring (4,2) counters

– Internal carry in at same “level” (i.e., bit position weight) as the internal carry out

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

Page 16: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

4x4 Partial Product Array Reduction

multiplicand

multiplier

partialproductarray

reduced pp array (to CPA)

double precision product

Fast 4x4 multiplication using (4,2) counters

Page 17: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

8x8 Partial Product Array Reduction

‘icand

‘ier

partialproductarray

reduced partial product array

How many (4,2) countersminimum are needed to reduce it to 2 rows?

Answer: 24

Page 18: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Alternate 8x8 Partial Product Array Reduction

‘icand

‘ier

partialproductarray

reduced partial product array

More (4,2) counters, so what is the advantage?

Page 19: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Array Reduction Layout Approach

multiple generators

multiplicand

multiple selection signals(‘ier)

. . .2(4,2) counter slice

(4,2) counter slice

(4,2) counter slice

CPA

Page 20: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Parallel Programmable Shifters

Dat

a InControl =

Dat

a O

ut

Shift amountShift directionShift type (logical, arith, circular)

Shifters used in multipliers, floating point units

Consume lots of area if done in random logic gates

Page 21: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

A Programmable Binary Shifter

rgt nop left

Ai

Ai-1 Bi-1

Bi

Ai Ai-1 rgt nop left Bi Bi-1

A1 A0 0 1 0 A1 A0

A1 A0 1 0 0 0 A1

A1 A0 0 0 1 A0 0

Page 22: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

4-bit Barrel Shifter

A0

A1

A2

A3

B0

B1

B2

B3

Sh1

Sh2

Sh3

Sh0 Sh1 Sh2 Sh3

Example: Sh0 = 1 B3B2B1B0 = A3A2A1A0

Sh1 = 1 B3B2B1B0 = A3A3A2A1

Sh2 = 1 B3B2B1B0 = A3A3A3A2

Sh3 = 1 B3B2B1B0 = A3A3A3A3

Area dominated by wiring

Page 23: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

4-bit Barrel Shifter Layout

Widthbarrel ~ 2 pm N N = max shift distance, pm = metal pitchDelay ~ 1 fet + N diff caps

Widthbarrel

Only one Sh#active at a timel

Page 24: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

8-bit Logarithmic Shifter

A3

A2

A1

A0

!Sh1Sh1 !Sh2Sh2 !Sh3Sh3

B0

B1

B2

B3

log N stages

0 0 0 111

Page 25: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

8-bit Logarithmic Shifter Layout Slice

Widthlog ~ pm(2K+(1+2+…+2K-1)) = pm(2K+2K-1) K = log2 NDelay ~ K fets + 2 diff caps

A0

B3

B2

B1

B0

A1

A2

A3

1 2 4

Page 26: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Shifter Implementation Comparisons

N K

Barrel Logarithmic

Width Speed Width Speed

2 N pm 1 + N diffs pm(2K+2K-1) K + 2 diffs

8 3 16 pm 1 + 8 13 pm 3 + 2

16 4 32 pm 1 + 16 23 pm 4 + 2

32 5 64 pm 1 + 32 41 pm 5 + 2

64 6 128 pm 1 + 64 75 pm 6 + 2

Page 27: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Decoders

Decodes inputs to activate one of many outputs

– two inverters, four 2-input nand gates, four inverters plus enable logic

– how about for a 3-to-8, 4-to-16, etc. decoder?

In0

In1

Enable

Out0 = !In1 & !In0

Out1 = !In1 & In0

Out2 = In1 & !In0

Out3 = In1 & In0

2x4

Page 28: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Dynamic NOR Decoder

Vdd GND GND

A0 !A0 A1 !A1

B0

B1

B2

B3

precharge

1

1

1

1

on on

on

on

0 1 0 10 1

0

0

0

1

Page 29: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Dynamic NAND Decoder

GND

A0 !A0 A1 !A1

B3

precharge

B2

B1

B0

0 1 0 1

onon

1

1

1

1

0 1

1

1

1

0

Page 30: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Building Big Decoders from Small

1x2

A4

enable

A3 A2

2x4

2x4

A1 A0

2x4

2x4

. . .

0 0 0 0 1

1 0 1

Active low enable Active low output

Page 31: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Multiplexers

Selects one of several inputs to gate to the single output

– two inverters, four 3-input nands, one 4-input nand

– how about for an 8x1, 16x1, etc. mux?

In0

S1 S0

Out = In0 & !S1 & !S0 | In1 & !S1 & S0 | In2 & S1 & !S0 | In3 & S1 & S0

In1

In2

In3

4x1

Page 32: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Review: TG 2x1 Multiplexer

GND

VDD

In1 In2S S

S S

S

S

!S

In2

In1

F

F

F = !((In1 & S) | (In2 & !S))

Page 33: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Review: Datapath Bit-Sliced Organization

Control Flow

Bit 0

Bit 1

Bit 2

Bit 3

Tile identical bit-slice elements

Re

gis

ter

File

Pip

elin

e R

egis

ter

Ad

der

Sh

ifter

Pip

elin

e R

egis

ter

Mu

ltip

lexe

r

Mu

ltip

lexe

r

Data Flow

Pip

elin

e R

egis

ter

From I$

Pip

elin

e R

egis

ter

To/From D$

decoder

Page 34: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Layout of Bit-Sliced Datapaths

Page 35: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Layout of Bit-sliced Datapaths

Without feedthroughs or pitch matching (4.2m2)

With feedthroughs (3.2m2)

With feedthroughs and pitch matching (2.2m2)

Page 36: Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Alpha 21264 Integer Unit DatapathMultimedia engine

ShifterIntercluster bypass

Adder

Logic box

Register fileRegister

file decoder

Logic box

AdderIntercluster bypass

Load bypassStore FIFO

Address drivers

tristate bus driver

bus driver

RC1_0RC1_1

RC2_0

RC2_1LSD_1LSD_0to D$