Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by...
-
Upload
jonathan-taylor -
Category
Documents
-
view
215 -
download
0
Transcript of Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by...
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
ECE 300
Advanced VLSI DesignFall 2006
Lecture 18: Adders, Multipliers, & Shifters
Yunsi Fei[Adapted from Jan Rabaey et al’s Digital Integrated
Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Review: Binary Adder Landscape
synchronous word parallel adders
ripple carry adders (RCA) carry prop min adders
signed-digit fast carry prop residue adders adders adders
Manchester carry carry conditional carry carry chain select lookahead sum skip
T = O(N), A = O(N)
T = O(1), A = O(N)
T = O(log N)A = O(N log N)
T = O(N), A = O(N)T = O(N)
A = O(N)
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Logarithmic Carry Lookahead Adders
Define carry operator € on (G,P) signal pairs
– € is associative, i.e.,
[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]
€
(G’’,P’’) (G’,P’)
(G,P)
where G = G’’ + P’’G’ P = P’’P’
€
€ €
€
G’
!G
G’’
P’’
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
General Structure Given P and G terms for each bit position, computing all the
carries is equal to finding all the prefixes in parallel
(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)
Since € is associative, we can group them in any order – but note that it is not commutative
Measures to consider– number of € cells
– tree cell depth (time)
– tree cell area
– cell fan-in and fan-out
– max wiring length
– wiring congestion
– delay path variation (glitching)
Pi, Gi logic (1 unit delay)
Si logic (1 unit delay)
Ci parallel prefix logic tree (1 unit delay per level)
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Brent-Kung TreeP
aral
lel P
refix
Com
puta
tion
€
G0
P0
G1
P1
G2
p2
G3
P3
G4
P4
G5
P5
G6
P6
G7
P7
G8
P8
G9
p9
G10
P10
G11
p11
G12
P12
G13
p13
G14
p14
G15
p15
€€€€€€€
€ € € €
€
€
€
€
€
€
€ € € € € €
€ €
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
0
€
T =
log 2
NT
= lo
g 2N
- 2
A =
2lo
g 2N
A = N/2
Tadd = tsetup + (2log2N-2) t€ + tsum
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Kogge-Stone PPF AdderP
aral
lel P
refix
Com
puta
tion
€
G0
P0
G1
P1
G2
P2
G3
P3
G4
P4
G5
P5
G6
P6
G7
P7
G8
P8
G9
P9
G10
P10
G11
P11
G12
P12
G13
P13
G14
P14
G15
P15
€€€€€€€
€ € € €
€
€
€
€
C1C 2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
N
A =
log 2
N
A = N
€€€€€€€
€ € € € € € € € € €
€ € € € € € € € € €
€ € € € € €
Tadd = tsetup + log2N t€ + tsum
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
DEC “alpha” 21064 Adder
64-bit adder, 0.75m technology, 5ns delay
On the 8-bit level: Manchester chain On the 32-bit sub-block: Carry look ahead On the 64-bit block: Carry select
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
More Adder Comparisons
0
10
20
30
40
50
60
70
8 bits 16 bits 32 bits 48 bits 64 bits
RCA
CSkA
VSkA
KS PPA
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Serial Adder
May be used in signal-processing arithmetic where fast computation is important but latency is unimportant.
Data format (LSB first):
bit 0bit 1bit 2bit 3...
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Serial adder Structure
LSB control signal clears the carry shift register:
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Multiply Operation
Multiplication as repeated additions
multiplicand
multiplier
partialproductarray
double precision product
N
2N
N can be formed in parallel
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Shift & Add Multiplication
Right shift and add– Partial product array rows are accumulated from top to bottom on an N-bit
adder
– After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
– Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster– Use a faster adder– Use higher radix (e.g., base 4) multiplication
» Use multiplier recoding to simplify multiple formation
– Form partial product array in parallel and add it in parallel Making it smaller (i.e., slower)
– Use an array multiplier» Very regular structure with only short wires to nearest neighbor cells. Thus, very
simple and efficient layout in VLSI
» Can be easily and efficiently pipelined
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Tree Multiplier Structure
partial productarray reduction tree
fast carry propagate adder (CPA)
P (product)
mux + reductiontree (log N)+CPA (log N)
Q (‘ier)
D (‘icand)
DD
D
0
00
0
multiple forming circuits
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
(4,2) Counter Built out of two (3,2) counters (just FA’s!)
– all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
– the internal output is carried to the next higher weight position (indicated by the )
(3,2)
(3,2)Note: Two carry outs - one “internal” and one “external”
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Tiling (4,2) Counters
Reduces column height from four to two– Tiles with neighboring (4,2) counters
– Internal carry in at same “level” (i.e., bit position weight) as the internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
4x4 Partial Product Array Reduction
multiplicand
multiplier
partialproductarray
reduced pp array (to CPA)
double precision product
Fast 4x4 multiplication using (4,2) counters
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
8x8 Partial Product Array Reduction
‘icand
‘ier
partialproductarray
reduced partial product array
How many (4,2) countersminimum are needed to reduce it to 2 rows?
Answer: 24
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Alternate 8x8 Partial Product Array Reduction
‘icand
‘ier
partialproductarray
reduced partial product array
More (4,2) counters, so what is the advantage?
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Array Reduction Layout Approach
multiple generators
multiplicand
multiple selection signals(‘ier)
. . .2(4,2) counter slice
(4,2) counter slice
(4,2) counter slice
CPA
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Parallel Programmable Shifters
Dat
a InControl =
Dat
a O
ut
Shift amountShift directionShift type (logical, arith, circular)
Shifters used in multipliers, floating point units
Consume lots of area if done in random logic gates
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
A Programmable Binary Shifter
rgt nop left
Ai
Ai-1 Bi-1
Bi
Ai Ai-1 rgt nop left Bi Bi-1
A1 A0 0 1 0 A1 A0
A1 A0 1 0 0 0 A1
A1 A0 0 0 1 A0 0
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
4-bit Barrel Shifter
A0
A1
A2
A3
B0
B1
B2
B3
Sh1
Sh2
Sh3
Sh0 Sh1 Sh2 Sh3
Example: Sh0 = 1 B3B2B1B0 = A3A2A1A0
Sh1 = 1 B3B2B1B0 = A3A3A2A1
Sh2 = 1 B3B2B1B0 = A3A3A3A2
Sh3 = 1 B3B2B1B0 = A3A3A3A3
Area dominated by wiring
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
4-bit Barrel Shifter Layout
Widthbarrel ~ 2 pm N N = max shift distance, pm = metal pitchDelay ~ 1 fet + N diff caps
Widthbarrel
Only one Sh#active at a timel
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
8-bit Logarithmic Shifter
A3
A2
A1
A0
!Sh1Sh1 !Sh2Sh2 !Sh3Sh3
B0
B1
B2
B3
log N stages
0 0 0 111
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
8-bit Logarithmic Shifter Layout Slice
Widthlog ~ pm(2K+(1+2+…+2K-1)) = pm(2K+2K-1) K = log2 NDelay ~ K fets + 2 diff caps
A0
B3
B2
B1
B0
A1
A2
A3
1 2 4
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Shifter Implementation Comparisons
N K
Barrel Logarithmic
Width Speed Width Speed
2 N pm 1 + N diffs pm(2K+2K-1) K + 2 diffs
8 3 16 pm 1 + 8 13 pm 3 + 2
16 4 32 pm 1 + 16 23 pm 4 + 2
32 5 64 pm 1 + 32 41 pm 5 + 2
64 6 128 pm 1 + 64 75 pm 6 + 2
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Decoders
Decodes inputs to activate one of many outputs
– two inverters, four 2-input nand gates, four inverters plus enable logic
– how about for a 3-to-8, 4-to-16, etc. decoder?
In0
In1
Enable
Out0 = !In1 & !In0
Out1 = !In1 & In0
Out2 = In1 & !In0
Out3 = In1 & In0
2x4
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Dynamic NOR Decoder
Vdd GND GND
A0 !A0 A1 !A1
B0
B1
B2
B3
precharge
1
1
1
1
on on
on
on
0 1 0 10 1
0
0
0
1
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Dynamic NAND Decoder
GND
A0 !A0 A1 !A1
B3
precharge
B2
B1
B0
0 1 0 1
onon
1
1
1
1
0 1
1
1
1
0
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Building Big Decoders from Small
1x2
A4
enable
A3 A2
2x4
2x4
A1 A0
2x4
2x4
. . .
0 0 0 0 1
1 0 1
Active low enable Active low output
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Multiplexers
Selects one of several inputs to gate to the single output
– two inverters, four 3-input nands, one 4-input nand
– how about for an 8x1, 16x1, etc. mux?
In0
S1 S0
Out = In0 & !S1 & !S0 | In1 & !S1 & S0 | In2 & S1 & !S0 | In3 & S1 & S0
In1
In2
In3
4x1
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Review: TG 2x1 Multiplexer
GND
VDD
In1 In2S S
S S
S
S
!S
In2
In1
F
F
F = !((In1 & S) | (In2 & !S))
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Review: Datapath Bit-Sliced Organization
Control Flow
Bit 0
Bit 1
Bit 2
Bit 3
Tile identical bit-slice elements
Re
gis
ter
File
Pip
elin
e R
egis
ter
Ad
der
Sh
ifter
Pip
elin
e R
egis
ter
Mu
ltip
lexe
r
Mu
ltip
lexe
r
Data Flow
Pip
elin
e R
egis
ter
From I$
Pip
elin
e R
egis
ter
To/From D$
decoder
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Layout of Bit-Sliced Datapaths
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Layout of Bit-sliced Datapaths
Without feedthroughs or pitch matching (4.2m2)
With feedthroughs (3.2m2)
With feedthroughs and pitch matching (2.2m2)
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei
Alpha 21264 Integer Unit DatapathMultimedia engine
ShifterIntercluster bypass
Adder
Logic box
Register fileRegister
file decoder
Logic box
AdderIntercluster bypass
Load bypassStore FIFO
Address drivers
tristate bus driver
bus driver
RC1_0RC1_1
RC2_0
RC2_1LSD_1LSD_0to D$