Evolution of Latches and Flip-Flops-1

Evolution of Latches and Flip-Flops

Introduction• The three factors together with the creative

minds of the times have yielded the most prominent and outstanding designs in the direction taken by research in a field is a function of

– the prevailing design philosophy– the requirements imposed on that field by other

disciplines– the shortcomings of existing designs/systems.

1. The Functionality Theme • The main tenet of combinatorial circuits is that the outputs are a function of current inputs—

there is no mechanism to store the previous input. The most primitive "memory" element is the SR latch—output can be asynchronously set to a 1 or reset to 0. In the absence of these two signals, the SR latch simply retained the last output.

• problem - output is undefined when both S and R are asserted. • Solution: the JK latch was designed so that the output toggled whenever both inputs were

asserted. • From the JK latch, T (toggle latch—the same input was fed to both J and K) and D (delay latch

—the input and its inverse were fed into the J and K inputs) latches can easily be constructed.

• The problem with the JK latch is that the output was extremely sensitive to the length of the J and K inputs whenever both signals are asserted. If the length of the pulses were longer than the feedback time, the output would repeatedly toggle. The only cure for this is to have an edge-triggered device, which samples the input on the transition of control signal rather than on the level. Such a device can be constructed from two latches placed in a Master- Slave configuration.

• In Fig.b, two latches are used in cascade to yield a flip-flop. To ensure reliable transfer of signals, the concepts of setup and hold time came into existence.

• Under the functionality theme, the behavioral/functional model of latches and flip-flops was fully defined. The implementation of these latches and flip-flops, however, was still at the gate level.

Basic "memory" elements-(a) A NAND gate-based SR latch with its truth table. The S and R inputs are active low and represented as Sb and Rb, respectively(b) A NAND gate- based JK latch derived from the SR latch in Fig.a with its truth table. The J and K inputs are active high.

(c) A T (toggle) latch derived from the JK latch with its truth table(d) A D (delay) latch derived from the JK latch with its truth table.

2. The Synchronous Theme

• To Design large asynchronous circuits & ensure a robust design, it is best to synchronize all activities with respect to a single global clock.

• Because the outputs of circuits must be sampled and stored at each clock edge to provide a stable set of inputs to subsequent circuits, the role of latches and flip-flops grew in importance.

• Because latches and flip-flops in synchronous circuits were used mainly as delay elements to buffer input, the words latch and flip-flop became synonymous with the D latch and flip-flop.

3. The Optimization Theme• Fig. 5.1b implementation of the Master-Slave

flip-flop based on the clocked SR flip-flop of Fig.5.3 requires 44 transistors =>a hefty portion of the silicon real estate.

• Simpler implementations can be done in 3 ways:

A). A fully static implementation of the SR flip-flop

B). A RAM-style-based SR latch.

C). CVSL—Cascade Voltage Switching Logic

(a) A fully static clocked SR latch with S and R as active high inputs

• Requires 12 transistors

• It switches faster

• Consumes less power

(b) a RAM-style clocked SR latch with S and R as active high inputs

• While switching, the n-pull-downs must fight against the p-pull-ups.

• Slower than fully static implementation

• Requires careful sizing to ensure proper switching.

• Requires only eight transistors => fulfills the optimization theme.

(c) a CVSL-style clocked SR latch with S and R as active high inputs• Has the n-transistors in the

feedback sections.

• Extremely sensitive to Wp/Wn ratios though it requires only 6 transistors.

• As complete inverters are absent, the CVSL latch does not have the advantage of positive feedback available to the circuit in RAM-TYPE.

• The slowest of the three.• Requires complementary i/ps.

• Assuming that the S and R inputs are never asserted together, this implementation of the SR latch using the CVSL logic will suffice.

Contd.,• A large decrease in the number of

transistors is possible by using the bidirectionality feature of the MOS transistor—the pass transistor design style (Fig.5.5a).

• The pass transistor version of the D flip-flop requires merely 12 transistors. As pass transistors suffer from lack of full swing—the logic 1 level transferred is degraded ≈ (VDD - Vtn).

• This deficiency can be overcome by using transmission gates (Fig 5.5b).

• As a result of bidirectionality, either a pass transistor or a transmission gate does not isolate the input from the output.

Contd.,• INV2 is a weak transistor made using

small gain p- and n-devices.

• Because of the bidirectionality of the transmission gate, the setup time of this latch would be high given that INV2 would have to drive the parasitic capacitance at the input.

• Also if the input is being driven by another logic circuit, and the clock is a high, the feedback from the latch could prevent the input from switching or, at the least, impede its switching.

• => Fig.5.5 & 5.6, a reduction in area is generally obtained by making appropriate tradeoffs with either speed or power or robustness.

4. The Performance Theme

• Area was the main object of concern. Uses 6 transistors, requires 2 complementary clocks. problem with providing 2 complementary clocks is the difficulty of routing near-perfect overlapping clocks. This spurred the idea of two- and four-phase clocks.

Contd.,• Fig.a. shows a two-phase clock with

two phases φ1 and φ2, and Figure 5.8b shows a four-phase clock with phases φ1, φ2, φ3 & φ4.

• => clock and clockb of the flip-flop can be replaced with φ1 & φ2, respectively, thereby producing a flip-flop free from feed through problems and race conditions.

• When the two-phase clock is used in static circuits, the time t12 (the delay between φ1 being high and φ2 being high) must be as small as possible to ensure that the feedback action comes into operation as soon as the input is switched off.

• However, t12 must be made as large as possible to eliminate feed through. These two competing demands place tight constraints on the designers.

Contd.,• φ 1 & φ3 are used for precharging. Because φ2 & φ

3 are non-overlapping, the evaluation of the first stage occurs before the second stage is precharged.

• Suffers from charge-sharing problems between nodes X and Y. The way to avoid this problem is to ensure that φ2 is on when node X is being precharged. This practice ensures that node Y is charged from VDD rather than by charge sharing with node X. Similarly, φ4 is on when node Y is precharged.

• To ensure that no skew develops between different clock phases of these two- and four-phase clocks, each phase must be routed in an identical manner to the others.

• 2 parallel long wires tend to suffer from crosstalk.

• For chip's operation speed >100 MHz, it is difficult to generate nonoverlapping clocks and control the clock skew properly in a VLSI chip as a result of statistical variations of components in the clock distribution path.

• The difficulty involved in routing so many clock phases rekindled interest in a single-phase clock and single-phase structures.

5. The Pipelining Theme• The basic motivation behind pipelining is to divide a task into

a number of subtasks and overlap the execution of these subtasks.

• This is achieved by breaking up a large combinatorial circuit into many stages by inserting registers between them and overlapping the execution of these stages.

• What is required here to implement this idea is not just producing fast flip-flops but to have a consistent methodology by which these latches and flip-flops can be interspersed with logic to yield fast pipeline structures.

• The first improvements came in the form of dynamic C2MOS latches and flip-flops. Because dynamic circuits require continuous refreshing, they are best used in pipelines where the data is regularly overwritten.

• C2MOS latches are clock skew resistant. In the event of φ and φb overlap, these circuits do not exhibit feed through because skew renders either both pull-down sections on or both pull-up sections on, but it never renders the two together.

• In either case, the input cannot feed through to the output. The only exception is that when inverting logic is placed between the two latches.

• In this case, a pull-up section and a pull-down section are active simultaneously, and this activity may cause the input to feed through to the output.

• Adv. - the layout of the C2MOS circuit is more compact .

Contd.,• The first consistent framework

for pipelining came with the NORA (No- RAce) technique.

• The NORA technique evolved from the Domino logic where the outputs of all cascaded stages go to logic 0 on precharge (Fig.5.11).

• During the evaluation period, the stages are successively evaluated much like a row of dominos falling.

• In Domino logic, all logic inputs are present in the n-logic blocks; however, in the NORA logic, they are placed alternately between n- and p-logic blocks.

• One n-logic block combined with another p-logic block and a C2MOS latch form a separate section.

Contd.,• Based on the clock phase used to precharge the

section, these sections are divided into two types—the φ-sections, shown in Fig.5.12a, which are precharged when φ is a low, and φb sections, illustrated in Fig.5.12(b), which are precharged when φb is low.

• When the logic in the φ -section is being precharged, the C2MOS latch causes the data at the output to be at hold. At the same time, the φb section is evaluating. => if the φ sections and the φb sections are interleaved, data is transferred to the output in a pipelined fashion.

• While the NORA provides a consistent framework for fast pipelines, some conditions are placed on the logic as follows:

1. The number of static inversions between two C2MOS latches should be even (in the absence of dynamic gates).

2. The number of static inversions between a C2MOS latch and a dynamic gate should be even.

Contd.,• The C2MOS functionality could be achieved without

using a two-phase clock; hence, the concept of true single-phase clock (TSPC) latches and flip-flops.

• The TSPC did away with the rules imposed on logic by the NORA technique. Fig.5.13 shows the basic stages in TSPC logic.

Contd.,• Fig.5.14 shows a negative edge-triggered TSPC based flip-flop.

• This flip-flop contains 9 transistors compared to the eight used in C2MOS. However, because routing a two- to four-phase clock was difficult, a single-phase solution was most welcome.

• Wave pipelining, removes the need for latches and flip-flops and yields the greatest possible throughput - the pipelining concept to the extreme, using logic element delays as the delay elements of the circuit.

6. The High-Performance and Low-Power Theme• CMOS - major advantage - low static power dissipation, but trends have

shown that the power dissipation of VLSI circuits has increased 4fold every 3 years.

• If this trend persists, the power dissipation of high-end ICs will soon exceed the practical limits of ceramic ICs. These facts lend creditability to all the hysteria over power dissipation.

• low-power techniques like voltage scaling can be applied to latches and flip-flops, but still certain peculiarities exist.

• The major research in this area had been in the area of identifying the relative contribution of the input and the clock to the power budget of a latch.

• Because the clocked nodes have a switching probability of 1, much effort has been expended to reduce the number of clocked transistors. As a result, a number of power-conscious latches and flip-flops have been proposed.

Contd.,• In single

edge-triggered flip-flops, one edge of the clock is always redundant.

• In an effort to reduce unnecessary power dissipation, double edge-triggered designs are gaining prominence.

Contd.,• Most double edge-triggered flip- flops constructed to date are composed of two

single edge-triggered sections placed in parallel with each other (Sections A and B) followed by some combining logic to multiplex the output signals.

• A double edge-triggered flip-flop can handle a data rate twice that of a single edge-triggered flip-flop. This change allows these flip-flops to be clocked at half the frequency, thus reducing the power dissipation by half.

• Double edge-triggered flip-flops require almost twice the area as single edge-triggered flip-flops. This fact is the most prominent pointer toward a shift in designer priorities from small silicon area to low power.

• Considering the tremendous power dissipation due to the clock, researchers have started exploring other design paradigms such as self-timed circuits (which use hand-shaking rather than a global clock to synchronize) and globally synchronous and locally asynchronous designs.

• The advantage of self-timed circuits is that the switching activity is minimized in the absence of data. With a globally synchronous clock and locally asynchronous logic circuits, the loading on the clock can be greatly reduced and, thus, the clock can be run at a higher frequency. In addition, locally asynchronous logic circuits can be disabled when there is no operation. This effort will greatly reduce the power dissipation.

Evolution of Latches and Flip-Flops-1

Documents

Transcript of Evolution of Latches and Flip-Flops-1