The Fundamental Hardware Elements and Options of Digital ...
Transcript of The Fundamental Hardware Elements and Options of Digital ...
University of Central Florida University of Central Florida
STARS STARS
Retrospective Theses and Dissertations
Fall 1983
The Fundamental Hardware Elements and Options of Digital The Fundamental Hardware Elements and Options of Digital
Signal Processing Signal Processing
Mark A. Hemmerlein University of Central Florida
Part of the Engineering Commons
Find similar works at: https://stars.library.ucf.edu/rtd
University of Central Florida Libraries http://library.ucf.edu
This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for
inclusion in Retrospective Theses and Dissertations by an authorized administrator of STARS. For more information,
please contact [email protected].
STARS Citation STARS Citation Hemmerlein, Mark A., "The Fundamental Hardware Elements and Options of Digital Signal Processing" (1983). Retrospective Theses and Dissertations. 686. https://stars.library.ucf.edu/rtd/686
THE FUNDAMENTAL HARDWA..R.E ELEMEHTS AND OPTIONS OF DIGIT1'.i..1 SIGNAL PROCESSING
BY
JYJ.AR.T{ A . HEMMERLEI N B. S.E., University of Central Florida, 1980
RESEARCH REPORT
Submitted in partial fulfillment of the requirements for the degree of Master of Science in Ehgineering
in the Graduate Studies Program of the College of Engineering University of Central Florida
Orlando, Florida
Fall Term 1983
ABSTRACT
The hardware available for digital signal processing is evaluated.
The elements required for implementation of digital signal processing
applications are identified. These elements-- memories, ALUs and mul
tipliers-- are analyzed. Then the operation of parts which make up a
range of possible solutions are evaluated. Thus, the graduating engi
neer is aided in making a transition from the theoretical to the prac
tical world.
TABLE OF CONTENTS
LIST OF TABL~S • • • • • • • • • • • • • • • • • • • • • • • •
LIST OF FIGURES • • • • • • • • •
I INTRODUCTION • • • • • • • • • • •
II BACKGROill.Jl) • • • • • • • • •
III - FUNDJ\MENTAL EL:E!MENTS • • • • • • •
1. 2. 3. 4.
Registers and Memories Addition • • • • Multiplication Di vision • •
• • •
• •
• • •
• • • • • • • • • • • • •
IV - H"PLEMENTATIONS • • • • • •
1. 2. 3. 4. 5.
Correlation/Convolution • • • • • • Fixed Instruction Set Microprocessor Bit Slice Processor • • • • • • • Single Chip Signal Processors • • • Comparison of Implementations • •
• • •
• • • •
• •
•
•
• •
• • • • • • • • •
• • • • • • •
• • • • • • •
• • • • • •
• • • • • • • • • • • • • • • • • • • •
• • • • • •
• • • • • • •
• • • • • • • • • •
• • • • • • • • • • • •
• • • • • •
V - CONCLUSION • • • • • • • • • • • • • • • •
REFERENCES • • • • • • • • • • • • • • • • •
iii
iv
v
1
2
3
3 8
11 14
16
16 18 22 28 31
34
35
LIST OF TABLES
1 . Data Path Delay 16 Bit Ripple Carry Adder/Subtracter • • • • •
2. Design Cost and Time Considerations • • • • • • • • • • • • •
iv
12
33
LI ST OF FIGURES
1. 4096 Bit Static Random Access Memory • • . . • • . • • . . 5 2. 16,384 :Bit Dynamic Random Access Memory • • • • . • . . • . 5 3. 8192 Bit PRON • . • • . • • • . . • . . . . . . • . • . . 7 4. 16 Word by 4 Bit Dual Port R.AJl.i . . • • • . . . • • . • 7 5. FIFO Memory • • • • . • . • • • . . • • . • . . • . . 9 6. Cascaded Full Adder Cells Connected as a Four Bit
Ri pple Carry Adder • . • • • • . . • . . • • • . . • . 10 7. Connection of 16 :Bit ALU Using Ripple Carry • • • • . • . • 10 8 . Full Lookahead Carry 16 Bit Adder • • • . • . . . • • • . • 12 9 . Clocked Add-shift Multiplier . . • . • . • • • • • • • • • 13
10 . Structure for Array Multiplier • . • . • • • • . . • • 13 11. Fixed Poin t Binary Division • • . • • • • • • • • . • 15 12. Di gi tal Correlator • • • • . • • • • • • • • • • • • • . . 17 13. Sam ple Output of Di gital Correlator • • . • • • . . • . . . 17 14 . Convolution Using a Digital Correlator • • . . • • . • • . 19 15. 8085 Yri croprocessor System Using 74S508 to Provide
a Hardware Multiply • • • • • • . . . . • . • . • . • • • . 20 16. Routine for 8085 Software Multiply • • . • . • • • • • • • 21 17. Routine for 8085/74S508 Multiply • . . . • • . • • • • • • 21 18 . Single Precision Format for 8232 Floating Point
Number • • . • . • • • • • . • . . . . • . • • • . . • • • 22 19 . Simple ~li croprograrnm ed Architecture . . . • . . . . • . • . 23 20. Pipelining Bit Slice Architecture . • . • • • . . • . 23 21. AM2903 ALU • • . . • • . • • ·• . . • . • . . • • . • • • • 25 22. Interconnections and Microcode Algorithm for 16
Bit Ri pple Carry Multiply • • • • • • • • . • • • • • • • • 26 23. Array Processor • • • • • • • • . • • • • • • . • • • . • . 27 24. 2920 Signal Processor • • . . . . • • • • • . • • • . . • • 29 25. TMS320 Im pl emen ta ti on for FIR Filter • • • . • • • . . • • 30 26. Complexity vs. Throughput . • • • • • . . • • • . • • • • • 32
v
I. INTRODUCTION
The field of digital signal processing (DSP) continues to be a
rapidly growing frontier in modern technology. In the past decade,
advances in digital hardware have opened doors to new designs and
inventions which were previously not possible. There are two major
subdivisions of digital signal processing. The first is digital fil
tering, which can be broken into finite impulse response (FI R) and
infinite impulse response (IIR) filters. The second is spectral
analysis, which can be broken into the calculation of spectra by
either discrete Fourier transform (DFT) or by statistical techniques
for random signal analysis.
The purpose of this paper is to describe some of the currently
available off-the-shelf hardware for digital signal processing. 'Ihe
purpose is not to discuss design techniques for digital filters or
spectral analysis. It is to examine the operation and specifications
of various industry standard parts.
II. :BACKG~omrn
The primary arithmetic operations for digital signal processing
are addition and multiplication. Consider a~ example from the first
major subdivision of digital signal processing-- digital filtering.
The second order filter model below requires 5 multiplications and
4 additions for each value of y(n) •
. zilso consider an exa~ple from the second major subgroup of digital
signal processing. The fast Fourier transform (FFT), which is at the
heart of most spectral analysis applications, is used to deterLline the
discrete Fourier transform (DFT) of the finite duration sequence x(n)
of length N.
N-1 [ 2 ~ X(k)=Ex(n)exp- N71 nk j
n=O
where k = 0, 1 , 2. • • N-1
In order to perform this spectral analysis, ~ log2I-~ complex mul tipli
cations and N log2N complex ad.di tions are required.
The four primary implementations to be considered for the above
digital signal processing applications are (1) hardwired logic,
including SSI/~IBI, semi-custom and custom integrated circuits; (2) 8
or 16 bit general purpose microprocessors; (3) single chip signal
processors; and (4) microprogrammable bit slice architectures.
2
There are some major tradeoffs for these four implementations.
If high speed, critical instruction sets or long word lengths are
required, the 8 or 16 bit standard microprocessors cannot be used.
3
If parts count and design time must be minimized, then the micropro
cessor-based design is preferred. Yet, if a system requires high speed
and long word lengths and still must be easily upgraded, then the bit
slice microprogrammed design might be preferred. It can usually be
changed by replacement of a PRm1: , which is much easier than cha'1ging
hardwired logic.
Thus, the balance of the paper discusses the memories, adders,
multipliers and controllers available for digital signal processing •
..
III. FUNDAMENTAL ELD:E:i·JTS
1. Registers and 1·1emories
r~emory is a fundamental element in most any digital system.
Digital filters requi re small amounts of high speed R.iU!i for the
storage of the previous input and output values. The spectral analysis
design requires a considerably larger R.Af·l memory for FFT calculations.
Each of the two designs also require memory for the control program.
Nemory can be classified in a number of ways. First, memory can be
static or dynamic. Static memory retains its information indefinitely
as long as power is on. Figure 1 shows a 1K by 4 bit static RAJ'i .
When V.I is high, the data input buffers are inhibited to prevent erro
neous data from entering the array. Data within the array can only be
changed during an overlap of CS and WE low. The MCM2148H-45 has a
maximum access time of 45 ns. Access time tA is defined as the time
from when the address initially becomes stable to the time when the
data becomes valid on the bus during a read cycle. Dynamic memory
requires a periodic refresh in order to maintain validity. Dynamic
memory achieves much greater density at the cost of additional refresh
circuitry and increased maximum access time. Figure 2 shows a block
diagram of the MCM4116B-15 which is a 16K bit dynamic RA1': with a maxi
mum access time of 150 ns. There are 14 address bits required to
decode one of the 16,384 cell locations. The address lines are multi
plexed onto the seven address inputs and latched into the on-chip
4
Fig-u.re 1.
R"As-------..
AS-------,..
A<I -------- Mux Addrns
AJ-------- Input
Buffers A2 -------- ( 7 )
A\
AO--------
BLOCK DIAGRAM
Row Select
Input Data
Control
• • •
Memory Array 64 Rows
64 Columns
AO Al A2 A3
Vcc=P1n 1s v 55 =Pin 9
4096 Bit Static Random Access Memory (Motorola, 1980)
BLOCK DIAGRAM
Row
Wrote
Clocks
Data In
Buffer
----"DD ----vcc ----vss
V99
Data In 0
Data Out
Buffer
1-------- Oat~ Out
Dummy Cells
Memory Arrav
Memory Array
Dummy Cells
6<1 ·Column Select L1nti'I
Co1umn Decodars
1 ·0f ·6•
AO
1 ·Of 2 Data
Figure 2. 16384 Bit Dynamic Random Access Memory (Motorola, 1980)
5
6
latches. The first clock, the row address strobe (RAS), latches the
seven row address bits into the chip. The second clock, the column
address strobe (CAS), latches the seven column address bits into the
chip. Data to be written into a selected cell is then latched into an
on-chip register by a combination of WRITE and CAS while RAS is active.
Data is to be retrieved from the selected cell in a read cycle by
maintaining WRITE in the inactive state throughout the portion of the
memory cycle in which CAS is active low.
Memory can also be classified by wri tabili ty. RO:Ms can only be
read, not written. PROMs can be written to only once by the user.
EPROMs can be erased ul travioletly and reprogrammed. EEPR01'1s can be
both erased and written to electrically. ROMs, PRO~IB, EPRO~IB and
~'PROMs all retain information after the power is turned off and are
usually used to store the controlling program. The read access for
the above chips are all very similar. Figure 3 shows a block diagram
for an 8192 bit PROM. By applying a unique address to A0 - ~ and
enabling the chip the output can be read on lines o0 - o7• The access
time for such a PROM is around 60 ns. The EEPROM 2816 has a read
access time of 250 ns, but it can also be written to electrically in
10 ms. RAMs can be written and read from at about the same speeds.
Unlike the previously mentioned forms of memory, the R.AI1's information
will be destroyed when power goes down.
The final way to classify memory is by random or sequential acces
sibility. In random access memory, successive addresses can be chosen
arbitrarily. RAMs may also be accessed through multiple ports. Figure
4 shows the AM29705 Dual Port RAM. The RAM features two separate
.. ~ .. 1 OF•
"°w A7 DeCOOER .. .. Ao A1 1Of14
~ COL..-t
DeCODER
AJ
Ci,
~ cs, ca..
Figure 3.
"1
x-:-rn
" ADDllESS DECODE II
FOllCE A --t~oo-----1 ZEllO
on --t~-0-----1
BLOCK DIAGRAM
COL..-t TUT RAIL
I i ... ,. = 8 ARRAY I!! • ~ i • Ill
~
TUTWON>O
~.,.WON) 1
8192 Bit PROM (Advanced ~licro Devices, 1982)
LOGIC DIAGRAM Do
" ADDllESS
A.l'()llT
A.OAT.A 4·81T
LATCH
D1 D2 D3
16·WOllD 8Y 4·llT • 1'#0-POA • 11.Alll ADDllESS
WE cs 8.l'OllT
13
• 82 ADDllESS DECOOEll 81
Bo
t----OC~I-- CJR
Figure 4. 16 Word by 4 Bit fual Port RAJ~ (Advanced Micro Devices, 1982)
7
8
output ports such that any two 4 bit words can be read from these
outputs simultaneously. In sequential memory, adjacent bits appear at
the output in sequence. Sequential memories take the fonn of first-
in-first-out (FIFO) memories and shift registers. FIFOs may be util-
ized whenever two variable speed processes must be interfaced or a
number of samples must be taken b~fore processing may begin (FFT proc-
essing). Figure 5 shows the block diagram for the c5/c67401/A FIFO
memory. Data enters when the shift in (SI) input goes high. Data
exits when the shift out (so) input goes high. The outputs input
ready (IR) and output ready (OR) are low when the memory is full or
empty respectively.
2. Addition
Figure 6 shows a ripple carry adder. Arithmetic logic units (ALU)
which utilize the ripple carry technique make up the simplest and
slowest designs. Consider four .AM25LS2517s used in such an ALU shown
in Figure 7. The worst case delay is computed using the following
three steps. First, select the longest combinatorial delay in the
least significant device from any input to the carry output C n+4.
This delay is usually from A or B inputs to the carry output. Second,
add the carry input to the ca:rry output propagation as many times as
required to represent each of the intermediate 4 bit ALUs. Third,
take the propagation delay from the carry input to the ALU adder out-
puts. Table 1 indicates the worst case delay for the ripple ca:rry
adder is 103 ns.
Block Diagram•
INPUT "fAOY
CS/C87.a1/ A Mx4
Figure 5. First-In-First-Out (FIFO) Memory (Monolithic Memories, 1981)
0 /
C1N Vo Xo
A
Co
So
Figure 6.
v, x, V2 X2 V3 X3
c B A A A
Co Co Co
s, 52 SJ CoUT
Cascaded Full Adder Cells Connected as a 4 Bit Ripple Carry Adder (Advanced Micro Devices, 1982)
•o • ' f~ •3
Figure 7.
•• F~ <6 f 7
Connection for a 16 Bit ALU Using Ripple Carry (Advanced Yrlcro Devices, 1982)
10
11
Figure 8 shows a full lookahead carry 16 bit adder. The worst
case delay is computed by summing three delay paths. The path delay
for Ai or Bi to G or Pis 26 ns (AM25LS381). Path delay for en to F3
is 19 ns. Note the sum delay of 55 ns is almost double the speed of
the ripple carry adder.
3. Multi plication
Multipliers can be broken down into two categories-- clocked
multipliers and array multipliers. Figure 9 illustrates the structure
of a clocked add-shift multiplier. Clock 1 shifts they bits into the
AND gates so that the x bits or Os are applied to the adders. After
the adders have stabilized, clock 2 strobes the adder outputs into the
accumulator flip-flops, which then are shifted one bit to the right.
This adder, having no lookahead logic, has a worst case delay for an
n bit adder of (n-1)Tc+T8+TR where TC is the ca:rry propagation time,
T8 is the sum propagation delay, and TR is the y-shift register set
tling time. Thus, if n bits are multiplied by m bits the delay time
for the multiplication is m(n-1)TC+mT8+TR.
The array multipliers are considerably faster than clocked multi
pliers. The array multipliers consist of a two-dimensional array of
one bit adders and are completely memoryless. The settling time for
the array of Figure 10 requires m sum delays and n ca:rry delays to
reach p7
; but to reach p13
n additional carry delays are required.
Thus, the total delay time is 2nTc+mT8•
'-----'C..
Figure 8.
Table 1.
Path
Aj or Bi to Cn+4
Cn to Cn+4
Cn to Cn+4
Cn t<;> Fi
Data Path Delay 16 Bit Ripple Carry Adder (Advanced Micro Devices, 1982)
Output Units'
Fi Cn+4 OVR
36 36 36 ns
22 22 22 ns
22 22 22 ns
23 - - ns
Cn to Cn+4 or OVA - 22 22 ns
TOTAL
16-bit delay 103 102 102 ns
i:, •, c,, •• l:2 '2 c,., .. _ Full Lookahead Carry 16 Bit Adder (Advanced Micro Devices, 1982)
12
CLOCK
SHIFT REGISTER x 3 x 2
AND GATES --- -- ---- -
ADDERS------ - -
SHIFT REGISTER __ AND ACCUMULATOR
CLOCK 2
Figure 9.
SUM INPUTS
CARRY O~RY 11\1
- ls~ OUT
Clocked Add-shift Multiplier (Rabiner-8-old, 1975)
Figure 10. Structure for an Array Multiplier (Rabiner-Gold, 1975)
13
14
4. Division
Division is much more difficult to realize than multiplication.
The logical nature of division is such that it does not lend itself to
speed-up as well as multiplication and, therefore, takes about four
times longer than multiplication. The simplest division scheme is sub
sequent subtraction. First, subtract the divisor from the dividend
and increment a counter. Continue as long as the remainder is positive.
When the remainder becomes negative, cancel the last step. The counter
will contain the quotient and the remainder will be correct.
A more rapid division can be realized by calculating the quotient
digits instead of counting them. First, the divisor is subtracted
from the most significant part of the dividend. If the remainder is
positive the quotient digit is "1"; otherwise the subtraction is can
celled by adding the divisor to the remainder, and the quotient digit
will be "0". Now shift the remainder one place to the right and
repeat until all the quotient digits have been calculated. Figure 11
shows an example of this binary division.
0.101110
0.011010)0.010011 -011010
001100 011010
-011010
010110 -011010
010010 -011010
001010 011010
Divisor larger than dividend, shift
Subtract, enter 1 Shift divisor, enter 0 Shift divisor
Subtract, enter 1 Shift divisor
Subtract, enter 1 Shift divisor
Subtract, enter 1 Shift divisor, enter 0
Figure 11. Fixed Point Binary Division (Hill-Peterson, 1978)
15
IV. IMPLEMENTATIONS
1. Correlation/Convolution
The correlation between two functions is a measure of their simi-
larity. Digital signal processing requires functions to be represented
in discrete form where time scale and amplitude are quantized into
discrete steps. Correlation in discrete form is given below.
oO
R(n) = L v 1(k) v2 (n+k) k=- 00
R(n) is the correlation between two signals, v 1(k) and v2(k), where k
and n are digital indexes representing their common independent vari-
able, generally time. It is determined by multiplying one signal
v 1(k), by the other signal shifted in time, v2(n+k), and then taking
the sum of the products. Convolution in discrete form is given below.
00
y(n) = ~ h (k) x(n-k) k=-oO
Correlation handles the two functions forward in time, while convolu-
tion treats one forward h(k) and one backward x(k). The filter out-
put is represented by y(n). Figure 12 shows the basic structure for
an all digital correlator. Each bit of an input shift register is
compared with a reference shift register through an exclusive-NOR gate.
The outputs are then summed. Figure 13 shows a sample output.
There are five steps to convolve two functions using a correlator.
First, load register M with function h(k) "backward" as shown in
INPUT SIGNAL
STORED REFERENCE
INPUT SHIFT REGISTER
• • •
• • •
CORRELATOR OUTPUT
REFERENCE SHIFT REGISTER
ANALOG OR DIGITAL SUMMER
Figure 12. Digital Correlator (TRW, 1981)
7-31T REGISTERS
• ::tJ m C'l
0
Vi 0 ~ ::tJ (/')
::c :;; ~ Cll
0
0
Figure 13.
0
0
0
CORRELATION
B t:_~
0 0 0 Ao 3
0 0 0 Al 2
0 0 0 A2 2
~
Sample Output of a IJigi tal Correlator (TRW, 1981)
17
18
Figure 14. Second, clock function x(k) forward through A until x 1 and
h 1 align. Third, compute the first convolution point, y 1=x1•h1
•
Fourth, advance "A" register contents one bit, and compute y2=h1x2
+h2x
1•
Fi fth, continue until the sequence is completed. The TRW TDC1023J has
a maximum bit rate of 20 MHz.
2. Fixed Instruction Set General Purpose Microprocessor
Most general purpose microprocessors either have 8 bit or 16 bit
word lengths. The microprocessor sacrifices bandwidth and word length
for board space and design ease. In di gital signal processing, the
throughput of the microprocessor system can usually be improved by
using a coprocessor for multiplication. Figure 15 shows an 8085-based
system using a 748508 to provide a hardware multiply. To compare
throughput, consider the software multiply of Figure 16. Also consider
the routine for the two-chip 8085/748508 combination which is shown in
Figure 17. The multiplication results are stored in registers Band E.
Worst case execution time drops from 181 qs to 7.5 ~s by using the
hardware multiply. The 8086 can do a firmware multiply in around 80 qs.
One way to dramatically increase the dynamic range of the micro-
processor-based system is to use a floating point coprocessor. The
8232 can provide 64 bits of precision for the 8085/8086 families. The
single precision format is shown in Figure 18. Bit 31 is the sign bit
where "0" represents positive. Bits 23-30 represent the exponent.
Bits 0-22 are the mantissa which represent the fractional magnitude of
the number.
Step 1: Load register M with function "h" (backward) to state "Mo."
Step 2: Clock function "x" forward through register A until x 1 and h 1 align.
Step 3: Compute first convolution point, Y1 = x1 • hi.
Step 4: Advance "A" register contents one bit, compute v2 = hix2 + h2x1.
Step 5: Continue until last step, x4h7 in this case.
Figure 14. Convolution Using a Digital Correlator (TRW, 1981)
19
/
.... 1
PAL .. DECODER
• /
/3 i • ~
INST GO R/W
OVERFLOW RAM 1tl508
ADD/ CONTROL ~ [a · ~K f4--
• • 8085 v v v
, 8 J 8 '8
AD0-AD7 - t t --CLK / - ,, 1
CLK
/
/
Figure 15. 8085 I'licroprocessor System Using 748508 to Provide a Hardware !"iul ti ply (Gordon, 1982)
20
MULTY LXI H,O CLEAR H AND L BYTES FOR RESULT
MVI D,O CLEAR THE HIGH PART OF THE SHIFT REGISTER
MVI c, 7 INITIALIZE COUNTER MU~T MOV A,B
RRC ROTATE B RIGHT, SAVE THE CARRY BIT
JNC SFT JUMP TO SFT IF CARRY =O DAD D ADD HAND LTO DAND E
SFT DEC c DECREMENT COUNTER JZ EXIT JUMP TO EXIT IF COUNT-
ER=O STC CMC CLEAR CARRY MOV A,E RAL ROTATE E LEFT, SAVE
CARRY BIT MOV E,A STORE NEWE MOV A,D RAL ROTATE D LEFT THROUGH
THE CARRY BIT
MOV D,A STORE NEWD JMP MULT
EXIT RET
Figure 16. Routine for 8085 Software Multiply (Gordon, 1982)
MOV 106, E LOAD X, INSTRUCTION 6 MOV 100, B LOAD Y, INSTRUCTION 0 NOP MULTIPLY MOV B, 107 READ AND STORE MSB OR
RESULT, INST 7 MOV E, 107 READ AND STORE LSB OR
RESULT, INST 7
Figure 17. Routine for 8085/748508 Multiply (Gordon, 1982 )
21
31 30 23 22 0
I s I E l M
Sign Exponent liantissa
Figure 18. Single Precision Format for an 8232 Floating Point Number
A nwnber of applications utili ze the microprocessor for offline
22
signal processing. These applications no longer require the high speed
real-time multipliers, adders and assorted processing hardware. The
real-time problem now is to sample and store on secondary memory as
rapidly as possible. Thus, high speed data handling devices such as
direct memory access (D}lA ) controllers combined with multiple buffering
techniques become the primary real-time ingredients.
3. Bit Slice Processor
Figure 19 shows a simple microprogrammed architecture built around
the AJV12901 bit slice ALU. The instruction lines, addresses, and the
data inputs normally will come from registers latched with the same
clock as the AM2901. The system is driven through the pipeline register
from a PROM containing the stored microprogram. This memory contains
microinstructions which apply the proper control signals to both the
ALU and other control circuitry. The address lines are generated by
the AJvI2901 microprogram sequencer. The start address plus control sig-
nals from the previous instruction aid this device in incrementing an
address, jumping to an address, storing an address or linking to sub-
routines. Note that as one instruction is executed the next instruc-
tion is being read from program memory. Any number of 4· bit slice
OTHER ADDRESS SOURCES
a-l....-.. ()II Am2111a ()II ""'29101
Figure 19.
FROM DATA IUS
~i 1•11
INSTRUCTION I REGISTER I
OR l'LA)I ~ING I l'ROM
.. A .., ( ,,
.... "" ""J D So
S1 FE
IEOUENCEll l'V'
y C' c,,
t @
J)
... "'~ A
START ADO RESS
ADDRESS
MICROf'ROGRAM STORE
CROMll'flOMl
COf'TROL
A , 8
JUtll' ADDRESS
CLOCK
TO OTHER DEVICES
DATA IN
DATA OUT
Simple Microprogrammed Architecture (Advanced J1·iicro Devices, 1982)
CLOCK l)__r-
... v
... CONTROL ...
v LOGIC ~ROM . SS 1 l ....
STATUS I ~ REGISTER
r-..r- (M\2111) c
- -FROM DATA IUS
~i ~
I=- 1-1 DATA
I REGISTER
cc MICROl'ROGRAM -«> llTS L
{J, MEMORY
0 ' IA C' 0 c;.., ~ROM I ... -21eTOOWOROS l'tl'ELIN£ G) ..
1 I@
v H z I @
@; REGISTER A ""'2901 ARRAY OVR -
ruction currently being exec:ut8d 1 Microinst
2 SeQuen mr control tin• •l•ct aource of next m icroinstruction eddrea
3 Next m
' Next m
icrolnstruction eddrn1
ictolnstruction
I Status bits from current microinstruction
I Status bits from last microinstruction
Figure 20.
,, e OR ... I ~ARRAY F3 Cn
y F•O
I l
... ~ ~ TO OTHER TO DATA IUS
SYSTEM ELEMENTS
(e. I ENAILES ON MAR , IR . ORI
Pipelining Bit Slice Architecture (Advanced Micro Devices, 1982)
i I ....... I ~TROLUfillT AHO/OR MEMORY
ADDRESS REGISTER
9 TO
ADDRESS IUS
23
24
AM2901s c-m be connected to increase the dynamic range of the CPU.
Thus, the CPU can be 4, 8, 12, 16, 32 or any increment of 4 bits wide.
Figure 20 illustrates a more complex pipelining architecture.
The same clock pulse loads the instruction register, updates the
sequencer, latches the next instruction out of the pipeline register,
and transfers the accumulator status to the sequencer. The AM2903 ALU
is shown in Figure 21. It contains an instruction decoder, scratch
pad RAM, ALU and multiply hardware for a 4 bit slice. The instruction
lines r0-r8 can be broken down as follows: r 5-r8 identify the destin
ation, r 1-r4 identify the ALU function, and r0 is a multiplexer select.
Figure 22 shows the interconnections and microcode algorithm for a 16
bit ripple carry multiply.
The AM29500 family make up an 8 bit slice processor which is
specifically targeted for digital signal processing applications. The
A1'129501 is the processing unit. Its pipelined architecture allows any
combination of register instructions, ALU instructions, and I/O instruc
tions that can be microprogrammed to occur in the same cycle. This
architecture allows overlap of external multiplication, ALU operations
and memory I/0 -as shown in Figure 23. The AM29516/AM29517 are 16 by
16 bit parallel multipliers. The array multiplier provides a 40 ns
multiply time. The family also features a programmable FFT address
sequencer-- the AM29540. It provides the RAM and ROM addresses neces
sary to perform the repetitive butterfly operations of the FFT. The
FFT sequencer provides a maximum transform length of 65,536 points.
BLOCK DIAGRAM
DATA IN
Ao - 3 e>------r--+--_. :DDRESS ADORES: t----.,,C---cJ llo - J
RAM MllTE iJl ENABLE
A B 1'--llO---DATA DUT DATADUT
~IN
~OVRCJ-------+---....;:i~
SI~ ~>l-------+-------1
l!IVe>-------+-------OtJ
iTN r>------
i:E D------4 INSTRUCTION e DECODE
CP
Vt>-------.....(~!5Gi
..----r---~c:ir 08o- J
0100
----ClCP
----a Vee
----ClGND
Figure 21. AH2903 Arithmetic Logic Unit (ALU) (Advanced Micro Devices, 1982)
25
........
.._.
O:i
Cou
r r;
=)D
--A
AM
3
Cn
·4
Ve
e
F3
470!
1 >
.>
O
VA
F =
0 z R
AM
3 =
F:i V
C,,
. 4
V C
, 1,
3 =
F 3V
OV
A
Oo
I 1,
I A
DO
0
1 B
+ 0
0 0
D
D
Lo--
.; "
,.v
... v
... 4
4 -
4 4
I I l
---
-O
o -
Oi
Ou
OJ
Ou
a,
AA
M0
-A
AM
J A
A M
u R
AM
J R
AM
0 R
AM
J
Cn
c ...
4 c,
, c,
,. 4
c,,
c ...
4 A
m2
90
1
Am
29
01
A
m2
90
1
Am
29
01
-
F3
-F
J -
F3
-O
VA
-
OV
A
-O
VA
~
F =
0
-F
= 0
-F
= 0
I I
I
' I
,..v
[,..
...
i.-
9 ..
9 9
• I,
=
00
v ...
v ,..
.i.-
v ....
4 4
4 ..
4 '
v v
v v
The
Alg
orit
hm
Th
e m
icro
cod
e al
gori
thm
is
as f
ollo
ws:
r1~".lr
D
Lo
ad m
ulti
plic
and
into
R"
Lo
ad m
ulti
plie
r in
to Q
If n
ot
do
ne,
ad
d,
then
sh
ift
Q d
ow
n a
nd
sh
ift
R"
do
wn
If d
on
e. a
dd
, th
en
shif
t Q
do
wn
an
d
shif
t R
n d
ow
n
D
n
R "
+-m
ulti
plic
and
Q+
-mu
ltip
lier
(Q
., ca
n b
e se
nse
d)
If Q
0 =
0,
Rn+
-( R
u +
0)/
2: Q
+-Q
/2
If Q
11 &
I,
Rn+
-< R
n+
R.\)
/2:
Q+
-Q/2
R
epea
t lo
op u
ntil
co
un
ter
= 0
If Q
11 =
0,
Ru+
-R1/2
: Q
+-Q
/2
If Q
11 =
I ,
R n
+-( R
tt -
R .·\
)/2:
Q +
-Q/2
Ou
RA
M0
, __
l c .
. -
~ '
LO
W
I ,.v 9
6 O
o
I , __
[\..
_ M
UX
11
SE
L
'· " I
t s "
8/
I,
·--<
Fig
ure
2
2.
Inte
rco
nn
ecti
on
s an
d A
lgo
rith
m f
or
the
16 B
it R
ipp
le
Car
ry M
ult
iply
(W
hit
e,
1981
) I\
)
0\
ALG
OR
ITH
M #
1
Am
29
54
0
1 '
FF
T A
DD
RE
SS
S
EQ
UE
NC
ER
ALG
OR
ITH
M #
2
ALG
OR
ITH
M #
3
fl) ::>
m
fl)
fl) w
a:
Q
Q c
AD
DR
ES
S
PIP
ELI
NE
A
m29
520/
1
AD
DR
ES
S
PIP
ELI
NE
A
m2
98
25
HO
ST
CO
MP
UT
ER
INT
ER
FA
CE
2
92
7/2
8/4
0/4
2/5
0/5
1/5
2/5
3
DA
TA
RA
M
Am
93
42
2
OR
A
m91
47
CO
EF
FIC
IEN
T
PR
OM
A
m27
S18
1
MIC
RO
CO
DE
ME
MO
RY
A
m27
S35
MIC
RO
PR
OG
RA
M C
ON
TR
OL
Am
29
10
/14
/25
IMA
GIN
AR
Y D
AT
A
RE
AL
P
RO
CE
SS
OR
A
m29
501
MIO
16 x
16
IMA
GIN
AR
Y
PR
OC
ES
SO
R
Am
2950
1
Ml
MU
LTIP
LIE
R I p
I
Am
2951
6/7
Fig
ure
23.
A
rra
y
Pro
cess
or
(Adv
ance
d M
icro
Dev
ices
, 19
82)
I\)
-..J
28
4. Single Chip Signal Processors
The Intel 2920 is a programmable single chip signal processor.
It contains an input multiplexer, a sample hold, an analog to digital
converter, a scratch pad RAM, an EPROM, an ALU, a digital to analog
converter and an output multiplexer on a single chip. The 2920 is
designed to replace analog subsystems such as filters, threshold
detectors, limiters, oscillators, and waveform generators to name a
few. The 2920 has a dynamic range limited to nine bi ts and a band
width limited to 13 KHz. Figure 24 shows a block diagram for the
processor.
The TMS320 is a single chip microcomputer aimed at giving both
the design ease of a general purpose microprocessor and the high
throughput of a rnultichip design. The TIVl.8320 is based on a modified
Harvard architecture which supports separate program and data memory
spaces. This architecture allows for the overlap of fetch and execute
cycles. The chip also features a 16 X 16 multiply at 200 ns and gives
a full 32 bit result. Figure 25 shows a TMS320 FIR filter implementa
tion and corresponding speed of execution.
v~~~~~~-1----------------------------------------.. '°"OGRAM STORAGE IEPROM l -------RST/EOP 192 • 2C RUN.flWC ____ .. .... ...,.,----------...., ...... ----.~--....,.~--..... ~
SIGIN 101------lr-.-.,..,
SIGIN 111-----_.
llGIN 121-----_.
llGIN 131-----_. CAP1
•CAP2
MUX
• aH
CLOCK LOGIC
• PROGRAM COUNTER
"EXTERNAL COMPONENTS
A/O CIR.
OMUX a.
S&H·a
DIA
Figure 24. Block Diagram 2920 Signal Processor (Intel Corporation, 1981)
SIGOUT 10)
SIGOUT 111
SIGOUT 121
SIGOUT Il l
SIGOUT !Cl
SIGOUT 151
SIGOUT 161
SIGOUT (71
29
The following equation is used to implement a filter model:
y(n) = Ax(n)+Bx(n-1)+Cx(n-2)+Dx(n-3)
Clock Cycles 2 1 1 1 1 1 1 1 1 1 1 1 2 2
17
START IN ZAC LT MPY LTD MPY LTD MPY LTD :MPY APAC SACH OUT B
TS = sampling period
Ts = 17(200 ns) = 3.4 qs
f = -1 = 294 KHz
S TS
X1, PAO
X4 D X3 c X2 B X1 A
y Y, PA 1 START
Figure 25. TMS 320 Implementation for FIR Filter
30
31
5. Comparison of Implementations
Figure 26 is a graph showing the complexity vs. throughput for
various implementations. The fast Fourier transform ranges from 32
point transform used in instrumentation to the very complex transforms
used in pattern recognition. The digital filtering applications range
from voice to video. The digital dynamic controllers require both
lower bandwidths and less complexity.
Figure 26 also shows the general techJ1ology requirements for
particular applications. For example, almost all control applications
can be handled without requiring a bit slice processor. Also very few
digital filtering applications can utilize an 8 bit microp~ocessor.
The TJV'S320 filter implementation of Figure 25 has a sampling
period of T8
equal to 294 KHz . Thus, if the input analog signal
requires at least 20 samples per period then the maximum bandwidth is
14.7 KHz.
BW = 1 20TS
Table 2 shows the design and cost tradeoffs for the various imple-
mentations. The 8 bit processor is the easiest to design and the
SSI/F1SI discrete logic is the most difficult. The TMS 320 provides an
excellent tradeoff of throughput and design time.
COM
PLEX
ITY
(N
UMBE
R OF
M
ULT
IPLI
CA
TI
ON
S PE
R
SAM
PLE
TIM
E)
10:M
1M
100K
10K
1K
100 10 1
0.0
01
0
.01
DIG
ITA
L DY
NAM
IC
CONT
ROL
0.1
1
10
FAST
FO
URI
ER
TRAN
SFOR
M
100
DIG
ITA
L FI
LTER
ING
1K
10K
.AN
ALO
G-S
IGN
AL
BAND
WID
TH
(Hz)
Fig
ure
2
6.
Co
mp
lex
ity
vs.
T
hro
ug
hp
ut
1.
2. 3.
4.
5.
6.
100K
lM
4 B
it ~
licroprocessor
8 B
it M
icro
pro
cess
or
8/1
6 B
it I
v'J.
icro
proc
esso
r 16
B
it M
icro
pro
cess
or
TM
S320
, 4
Bit
Sli
ce
8 B
it S
lice
\.>J
I'\.
)
Tab
le 2
. D
esig
n
Co
st a
nd
Tim
e C
on
sid
erat
ion
s fo
r a
10
th O
rder
Dig
ital
Fil
ter
Des
ign
D
esig
n
Tim
e C
om
ple
xit
y
Imp
lem
enta
tio
ns
(Day
s)
(if
Ch
ips)
8 B
it M
icro
pro
cess
or
4 5
16 B
it M
icro
pro
cess
or
7 8
INT
EL
29
20
4 1
TN
S320
6
6 4
Bit
Sli
ce
20
35
8 B
it S
lice
15
20
SS
I/M
SI
60
150
Ele
ctr
on
ic
Com
pone
nt
Co
st
(Do
llars
)
45
100
100
300
700
1000
20
0
No
te:
The
ab
ov
e fi
gu
res
sho
uld
o
nly
be
use
d i
n
a re
lati
ve s
en
se.
Des
ign
ti
me
is
dep
end
ent
on
the
exp
erie
nce
o
f th
e d
esi
gn
er.
T
he
co
st i
s d
epen
den
t up
on
the
env
iro
nm
ent
in w
hich
th
e sy
stem
is u
sed
and
part
s co
st a
t th
at
tim
e.
"""'
"""'
X. CONCLUSION
Digital signal processing requires high speed arithmetic and con
trol sequencing capabilities for both digital filtering and spectral
analysis. The fundamental hardware elements are memories, adders, mul
tipliers and dividers. Memory can be classified as either static or
dynamic. It can also be classified by either writability or accessi
bility. ALUs can utilize the slower ripple carry adders or the more
complex lookahead carry adders. Division tends to be slower and more
complex than multiplication.
There is a range of possible solutions available to implement
digital signal processing applications. Correlation is a measure of
similarity between functions. The digital correlator may be used to
convolve two funGtions to realize a desired filter response. The gen
eral purpose microprocessors are desirable in applications requiring
limited dynamic range and limited bandwidth. The bit slice processor
can attain greater dynamic range and greater speed than the micropro
cessor-based system. It will usually, however, lead to a more complex
circuit design. The present trend is to develop single chip signal
processors to replace the more complex multichip bit slice designs.
REFERENCES
Advanced ~licro Devices. Array and Digital Sign.al Processing Products. Sunnyvale, CA: Advanced Ydcro Devices, 1982.
Cushman, Robert H. "Digitization Is on the Way for FFT Designs." Electronic Design News (August 1981): 99-106.
Eldon, John. Correlation-- A Powerful Technique for Digital Signal Processing. La Jolla, CA: TRW, 1981.
Gordon Ehud. "Speed AP Arithmetic Functions While Using Less Software." Electronics Design News (Nay 1982): 167-171.
Hill, Fredrick J., and Peterson, Gerald R. Digital Systems-- Hardware Organi zation and Design• New York: John Wiley, 1978.
Intel Corporation. Intel Component Data Catalog. Santa Clara, CA: Intel Corporation, 1981.
Nick, John R. "Understanding the .AM25LS2517 and AM25LS381." .AND Bipolar Mi croprocessor Logic and Interface Data Book. Sunnyvale, GA : Advanced I'-U.cro Devices, 1982.
1onolithic Memories Inc. :Monolithic Memories Bipolar LSI Databook. Sunnyvale, CA: Monolithic Memories Inc., 1981.
Eotorola Inc. Motorola Memory Data Manual. Austin, TX: Motorola Inc., 1980.
Rabiner, Lawrence R., and Gold, Bernard. Theory and Application of Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975.
White, Donnamaie E. Bit Slice Design: Controllers and ALUs. New York: Garland STPl'fi Press, 1981.