Chapter 9 Sequential Circuit Design:...

Post on 24-Mar-2018

238 views 2 download

Transcript of Chapter 9 Sequential Circuit Design:...

Chapter 9

Sequential Circuit Design:

Practice

ECOM 4311

Digital System Design using VHDL

2

Outline

1. Poor design practice and remedy

2. More counters

3. Register as fast temporary storage

4. Pipelined circuit

3

1. Poor design practice and remedy

Synchronous design is the most

important methodology

Poor practice in the past (to save chips)

• Misuse of asynchronous reset

• Misuse of gated clock

• Misuse of derived clock

4

Misuse of asynchronous reset

• Poor design: use reset to clear register in normal operation.

• e.g., a poorly mod-10 counter

oClear register immediately after the counter reaches 1010

5

Misuse of asynchronous reset

6

Problem

• Glitches in transition 1001 (9) => 0000 (0)

• Glitches in aync_clr can reset the counter

• How about timing analysis? (maximal clock

rate)

Asynchronous reset should only be used

for power-on initialization

Remedy: load “0000” synchronously

Misuse of asynchronous reset

7

Remedy: load “0000” synchronously

Remedy

8

Misuse of gated clock

• Poor design: use a and gate to disable the

clock to stop the register to get new value

• E.g., a counter with an enable signal

9

Problem

• Gated clock width can be narrow

• Gated clock may pass glitches of en

• Difficult to design the clock distribution

network

Misuse of gated clock

10

• Remedy: use a synchronous enable

11

Misuse of derived clock

• Subsystems may run at different clock rate

• E.g., second and minutes counter

o Input: 1 MHz clock

o Poor design:

12

o Better design

13

2. More counters

• Counter circulates a set of specific patterns

• Counter: o Binary

o Gray counter

o Ring counter

o Linear Feedback Shift Register (LFSR)

o BCD counter

14

o State follows binary counting sequence

oUse an incrementor for the next-state logic

d

clk

q

reset

+1r_reg r_next

reset

clk

q

Binary counter

15

o State changes one-

bit at a time

oUse a Gray

incrementor

Gray counter:

16

Gray counter:

17

Gray counter:

18

Gray counter:

19

Ring counter

Circulate a single 1

E.g., 4-bit ring counter: 1000, 0100, 0010, 0001

n patterns for n-bit register

Non self-correcting design

• Insert “0001” at initialization and circulate the pattern in normal operation

• Fastest counter

20

Ring counter

21

Ring counter

22

Ring counter

This simple design makes this a very fast counter (much faster than binary cnter)

23

Self-correcting design:

A self-correcting design ensures that a ’1’ is

always circulating in the ring

This is accomplished by inspecting the 3 MSBs if

"000", then the combo. Logic inserts a ’1’ into the

low order bit

24

Self-correcting design:

25

Self-correcting design:

26

LFSR (Linear Feedback Shift Reg)

An LFSR is a shifter register that contains

an XOR feedback network that determines

the next serial input value

Only a subset of the register bits are used

in the XOR operation

By carefully selecting the bits, an LFSR can

be designed to circulate through all 2n-1

states for an n-bit register

27

• E.g, 4-bit LFSR

Note that the state "0000" is excluded -- if it ever shows up, the

LFSR becomes stuck

28

Property of LFSR

• N-bit LFSR can cycle through 2n-1 states

• The feedback circuit to generate a maximal

number of states exists for any n

• The sequence is pseudorandom

29

Application of LFSR

• Pseudorandom: used in testing, data

encryption/decryption

30

4-bit LFSR

31

4-bit LFSR

32

4-bit LFSR

Read remaining of Section 9.2.3 (design to including

00..00 state)

33

Decimal Counter

A decimal counter circulates the patterns

in binary-coded decimal (BCD) format. The

BCD

code uses 4 bits to represent a decimal

number. For example, the BCD code for

the three digit decimal number 139 is "0001

001 1 1001".

34

Decimal Counter

35

Decimal Counter

36

Decimal Counter

37

Decimal Counter

38

PWM (pulse width modulation)

Duty cycle: percentage of time that the

signal is asserted

PWM: use a signal, w, to specify the duty

cycle

• Duty cycle is w/16 if w is not “0000”

• Duty cycle is 16/16 if w is “0000”

Implemented by a binary counter with a

special output circuit

39

PWM (pulse width modulation)

40

PWM (pulse width modulation)

41

PWM (pulse width modulation)

42

Register as fast temporary storage

RAM • RAM cell designed at transistor level

• Cell use minimal area

• Behave like a latch

• For mass storage

• Need a special interface logic

Register • D FF requires much larger area

• Synchronous

• For small, fast storage

• E.g., register file, fast FIFO, Fast CAM (content addressable memory)

43

Register file

Registers arranged as an 1-d array

Each register is identified with an address

Normally has 1 write port (with write enable

signal)

Can has multiple read ports

44

• E.g., 4-word register file with 1 write port

and two read ports

45

Register array:

• 4 registers with 16 bits

• Each register has an enable signal

Write decoding circuit:

• 0000 if wr_en is 0

• 1 bit asserted according to w_addr if wr_en is 1

Read circuit:

• A mux for each read port

Register file

46

VHDL Implementation code

A 2-D data type is needed here

47

VHDL Implementation code

51

FIFO Buffer

• “Elastic” storage between two subsystems

52

Circular queue implementation

Use two pointers and a “generic storage”

• Write pointer: point to the empty slot before

the head of the queue

• Read pointer: point to the tail of the queue

FIFO Buffer

54

FIFO Buffer

55

FIFO controller

• Read and write pointers: 2 counters

• Status circuit:

• Difficult

• Design 1: Augmented binary counter

• Design 2: with status FFs

• LSFR as counter

56

Augmented binary counter:

• increase the counter by 1 bits

• Use LSBs for as register address

• Use MSB to distinguish full or empty

57

FIFO controller with Augmented counter

58

FIFO controller with Augmented counter

59

60

2 extra status FFs

• Full_erg/empty_reg memorize the current

staus

• Initialized as 0 and 1

• Modified according to wr and rd signals:

• 00: no change

• 11: advance read pointer/write pointer; full/empty

no change

• 10: advance write pointer; de-assert empty; assert

full if needed (when write pointer=read pointer)

• 01: advance read pointer; de-assert full; asserted

empty if needed (when write pointer=read pointer)

61

2 extra status FFs

64

67

Other design

Non-binary counter for the pointer

• Exact location does not matter as long as the

write pointer and read pointer follow the same

pattern

• Other counters can be used for the second

scheme

• E.g, use LFSR

68

4. Pipelined circuit

• Two performance criteria:

o Delay: required time to complete one task

o Throughput: number of tasks completed per unit time.

• E.g., ATM machine

o Original: 3 minutes to process a transaction

delay: 3 min; throughput: 20 trans per hour

o Option 1: faster machine 1.5 min to process

delay: 1.5 min; throughput: 40 trans per hour

o Option 2: two machines

delay: 3 min; throughput: 40 trans per hour

• Pipelined circuit: increase throughput

69

Pipeline: overlap certain operation

E.g., pipelined laundry:

70

Pipeline

Non-pipelined:

• Delay: 60 min

• Throughput 1/60 load per min

Pipelined:

• Delay: 60 min

• Throughput k/(40+k*20) load per min

about 1/20 when k is large

• Throughput 3 times better than non-pipelined

71

Pipelined combinational circuit

Basic idea is to divide the combinational logic into a set of

stages, with buffers (registers or latches) inserted

between each stage

72

Pipelined combinational circuit

Given a pipeline with stage delays of T1, T2, T3 and T4,

clock cycle time is bounded by :

ADD the setup and clock-to-q delays of the pipeline

registers

In non-pipelined version, delay to process one item is

For the pipelined version, its actually longer

73

Pipelined combinational circuit

The win is actually w.r.t. the throughput

metric

For pipelined version, it takes 3*Tc time to

fill the pipeline and the time to process k

items is 3*Tc + kTc yielding :

TP = k/(3*Tc + kTc)

which approaches 1/Tc for large k

74

Adding pipeline to a comb circuit

• Not all circuits are amenable to pipelines.

• The following criteria required for pipelining:

oData is always available for the pipelined

circuit’s inputs

o System throughput is an important

performance characteristic

oCombinational circuit can be divided into

stages with similar propagation delays

o Propagation delay of a stage is much larger

than the Tsetup and Tcq of the register

75

• Derive the block diagram of the original

combinational circuit and arrange the circuit as a

cascading chain

• Identify the major components and estimate the

relative propagation delays of these components

• Divide the chain into stages of similar

propagation delays

• Identify the signals that cross the boundary of

the chain

• Insert registers for these signals in the

boundary.

Procedure

76

Pipelined comb multiplier

77

The two major components are the adder

and bit-product generation circuit

Arrange this components in cascade as

shown on the next slide, with the bit-

product labeled as BP

The bit-product circuit is simply an AND

operation and therefore has a small delay

We combine it with the adder to define a

stage

Pipelined comb multiplier

78

79

Pipelined comb multiplier

There are two types of pipeline registers

• One type to accommodate the computation

flow and to store the intermediate results

(partial products pp1, ... pp4)

• Second type to preserve the info needed in

each stage, i.e., a1, a2, a3, b1, b2, and b3

• Since there are different multiplications

occurring in each stage, the operands for any

given multiplication must move along with the

partial products

80

Pipelined comb multiplier

81

82

83

84

85

87

88

89

Congratulation! Your first pipeline….

90

?

Any Questions?