Post on 24-Mar-2018
Chapter 9
Sequential Circuit Design:
Practice
ECOM 4311
Digital System Design using VHDL
2
Outline
1. Poor design practice and remedy
2. More counters
3. Register as fast temporary storage
4. Pipelined circuit
3
1. Poor design practice and remedy
Synchronous design is the most
important methodology
Poor practice in the past (to save chips)
• Misuse of asynchronous reset
• Misuse of gated clock
• Misuse of derived clock
4
Misuse of asynchronous reset
• Poor design: use reset to clear register in normal operation.
• e.g., a poorly mod-10 counter
oClear register immediately after the counter reaches 1010
5
Misuse of asynchronous reset
6
Problem
• Glitches in transition 1001 (9) => 0000 (0)
• Glitches in aync_clr can reset the counter
• How about timing analysis? (maximal clock
rate)
Asynchronous reset should only be used
for power-on initialization
Remedy: load “0000” synchronously
Misuse of asynchronous reset
7
Remedy: load “0000” synchronously
Remedy
8
Misuse of gated clock
• Poor design: use a and gate to disable the
clock to stop the register to get new value
• E.g., a counter with an enable signal
9
Problem
• Gated clock width can be narrow
• Gated clock may pass glitches of en
• Difficult to design the clock distribution
network
Misuse of gated clock
10
• Remedy: use a synchronous enable
11
Misuse of derived clock
• Subsystems may run at different clock rate
• E.g., second and minutes counter
o Input: 1 MHz clock
o Poor design:
12
o Better design
13
2. More counters
• Counter circulates a set of specific patterns
• Counter: o Binary
o Gray counter
o Ring counter
o Linear Feedback Shift Register (LFSR)
o BCD counter
14
o State follows binary counting sequence
oUse an incrementor for the next-state logic
d
clk
q
reset
+1r_reg r_next
reset
clk
q
Binary counter
15
o State changes one-
bit at a time
oUse a Gray
incrementor
Gray counter:
16
Gray counter:
17
Gray counter:
18
Gray counter:
19
Ring counter
Circulate a single 1
E.g., 4-bit ring counter: 1000, 0100, 0010, 0001
n patterns for n-bit register
Non self-correcting design
• Insert “0001” at initialization and circulate the pattern in normal operation
• Fastest counter
20
Ring counter
21
Ring counter
22
Ring counter
This simple design makes this a very fast counter (much faster than binary cnter)
23
Self-correcting design:
A self-correcting design ensures that a ’1’ is
always circulating in the ring
This is accomplished by inspecting the 3 MSBs if
"000", then the combo. Logic inserts a ’1’ into the
low order bit
24
Self-correcting design:
25
Self-correcting design:
26
LFSR (Linear Feedback Shift Reg)
An LFSR is a shifter register that contains
an XOR feedback network that determines
the next serial input value
Only a subset of the register bits are used
in the XOR operation
By carefully selecting the bits, an LFSR can
be designed to circulate through all 2n-1
states for an n-bit register
27
• E.g, 4-bit LFSR
Note that the state "0000" is excluded -- if it ever shows up, the
LFSR becomes stuck
28
Property of LFSR
• N-bit LFSR can cycle through 2n-1 states
• The feedback circuit to generate a maximal
number of states exists for any n
• The sequence is pseudorandom
29
Application of LFSR
• Pseudorandom: used in testing, data
encryption/decryption
30
4-bit LFSR
31
4-bit LFSR
32
4-bit LFSR
Read remaining of Section 9.2.3 (design to including
00..00 state)
33
Decimal Counter
A decimal counter circulates the patterns
in binary-coded decimal (BCD) format. The
BCD
code uses 4 bits to represent a decimal
number. For example, the BCD code for
the three digit decimal number 139 is "0001
001 1 1001".
34
Decimal Counter
35
Decimal Counter
36
Decimal Counter
37
Decimal Counter
38
PWM (pulse width modulation)
Duty cycle: percentage of time that the
signal is asserted
PWM: use a signal, w, to specify the duty
cycle
• Duty cycle is w/16 if w is not “0000”
• Duty cycle is 16/16 if w is “0000”
Implemented by a binary counter with a
special output circuit
39
PWM (pulse width modulation)
40
PWM (pulse width modulation)
41
PWM (pulse width modulation)
42
Register as fast temporary storage
RAM • RAM cell designed at transistor level
• Cell use minimal area
• Behave like a latch
• For mass storage
• Need a special interface logic
Register • D FF requires much larger area
• Synchronous
• For small, fast storage
• E.g., register file, fast FIFO, Fast CAM (content addressable memory)
43
Register file
Registers arranged as an 1-d array
Each register is identified with an address
Normally has 1 write port (with write enable
signal)
Can has multiple read ports
44
• E.g., 4-word register file with 1 write port
and two read ports
45
Register array:
• 4 registers with 16 bits
• Each register has an enable signal
Write decoding circuit:
• 0000 if wr_en is 0
• 1 bit asserted according to w_addr if wr_en is 1
Read circuit:
• A mux for each read port
Register file
46
VHDL Implementation code
A 2-D data type is needed here
47
VHDL Implementation code
51
FIFO Buffer
• “Elastic” storage between two subsystems
52
Circular queue implementation
Use two pointers and a “generic storage”
• Write pointer: point to the empty slot before
the head of the queue
• Read pointer: point to the tail of the queue
FIFO Buffer
54
FIFO Buffer
55
FIFO controller
• Read and write pointers: 2 counters
• Status circuit:
• Difficult
• Design 1: Augmented binary counter
• Design 2: with status FFs
• LSFR as counter
56
Augmented binary counter:
• increase the counter by 1 bits
• Use LSBs for as register address
• Use MSB to distinguish full or empty
57
FIFO controller with Augmented counter
58
FIFO controller with Augmented counter
59
60
2 extra status FFs
• Full_erg/empty_reg memorize the current
staus
• Initialized as 0 and 1
• Modified according to wr and rd signals:
• 00: no change
• 11: advance read pointer/write pointer; full/empty
no change
• 10: advance write pointer; de-assert empty; assert
full if needed (when write pointer=read pointer)
• 01: advance read pointer; de-assert full; asserted
empty if needed (when write pointer=read pointer)
61
2 extra status FFs
64
67
Other design
Non-binary counter for the pointer
• Exact location does not matter as long as the
write pointer and read pointer follow the same
pattern
• Other counters can be used for the second
scheme
• E.g, use LFSR
68
4. Pipelined circuit
• Two performance criteria:
o Delay: required time to complete one task
o Throughput: number of tasks completed per unit time.
• E.g., ATM machine
o Original: 3 minutes to process a transaction
delay: 3 min; throughput: 20 trans per hour
o Option 1: faster machine 1.5 min to process
delay: 1.5 min; throughput: 40 trans per hour
o Option 2: two machines
delay: 3 min; throughput: 40 trans per hour
• Pipelined circuit: increase throughput
69
Pipeline: overlap certain operation
E.g., pipelined laundry:
70
Pipeline
Non-pipelined:
• Delay: 60 min
• Throughput 1/60 load per min
Pipelined:
• Delay: 60 min
• Throughput k/(40+k*20) load per min
about 1/20 when k is large
• Throughput 3 times better than non-pipelined
71
Pipelined combinational circuit
Basic idea is to divide the combinational logic into a set of
stages, with buffers (registers or latches) inserted
between each stage
72
Pipelined combinational circuit
Given a pipeline with stage delays of T1, T2, T3 and T4,
clock cycle time is bounded by :
ADD the setup and clock-to-q delays of the pipeline
registers
In non-pipelined version, delay to process one item is
For the pipelined version, its actually longer
73
Pipelined combinational circuit
The win is actually w.r.t. the throughput
metric
For pipelined version, it takes 3*Tc time to
fill the pipeline and the time to process k
items is 3*Tc + kTc yielding :
TP = k/(3*Tc + kTc)
which approaches 1/Tc for large k
74
Adding pipeline to a comb circuit
• Not all circuits are amenable to pipelines.
• The following criteria required for pipelining:
oData is always available for the pipelined
circuit’s inputs
o System throughput is an important
performance characteristic
oCombinational circuit can be divided into
stages with similar propagation delays
o Propagation delay of a stage is much larger
than the Tsetup and Tcq of the register
75
• Derive the block diagram of the original
combinational circuit and arrange the circuit as a
cascading chain
• Identify the major components and estimate the
relative propagation delays of these components
• Divide the chain into stages of similar
propagation delays
• Identify the signals that cross the boundary of
the chain
• Insert registers for these signals in the
boundary.
Procedure
76
Pipelined comb multiplier
77
The two major components are the adder
and bit-product generation circuit
Arrange this components in cascade as
shown on the next slide, with the bit-
product labeled as BP
The bit-product circuit is simply an AND
operation and therefore has a small delay
We combine it with the adder to define a
stage
Pipelined comb multiplier
78
79
Pipelined comb multiplier
There are two types of pipeline registers
• One type to accommodate the computation
flow and to store the intermediate results
(partial products pp1, ... pp4)
• Second type to preserve the info needed in
each stage, i.e., a1, a2, a3, b1, b2, and b3
• Since there are different multiplications
occurring in each stage, the operands for any
given multiplication must move along with the
partial products
80
Pipelined comb multiplier
81
82
83
84
85
87
88
89
Congratulation! Your first pipeline….
90
?
Any Questions?