McEnroe Thesis CH2 Review of Literature

8/3/2019 McEnroe Thesis CH2 Review of Literature

1/21


2/21

Figure 1.-- FIFO Block Diagram

Introduction to F IFOsThe First-In First-Out concept provides a means to store data for retrieval in

the exact input order. Desired properties include simultaneous and asynchronous readand write operations. Early FIFO devices were constructed of a parallel set of serial,linear shift registers. Data words presented to the input trickled through the shiftregisters to reach the output. A major disadvantage with these early FIFOs is theirlimited capacity and excessive delay (Miller 1985 sec. 14 p. 2). Because the datamust propagate through each storage location, input to output latency time is 0 ( K )where K is the capacity of the FIFO.

The second generation of FIFOs store data in memory cells and use internallogic to track the storage and retrieval addresses. The block diagram in fig. 1 showsthe logical structure of a FIFO. TheRAM array holds the data. A writepointer and read pointer contain theaddresses of the last word written andthe next word to be read. The flaglogic tracks the relative locations of thetwo pointers. Both pointers start atOx0000. The write pointer incrementsby one for each data word written. Adifference between the two pointervalues indicates the FIFO contains data. The read pointer increments for each word

4


3/21

Figure 2.-- Static RAM Cell

read until it equals the write pointer, whereupon the FIFO asserts the empty flag.When the full flag indicates K words in the FIFO, the write pointer value is one lessthan the read pointer value.

FIFOs are designed for simultaneous read and write operations in order tobuffer data between two asynchronous entities. The current generation of FIFOs usesa dual port static RAM cell to allow these simultaneous reads and writes. To selecta proper fault model, we will need to understand memory cell structures.

Static RAM (SRAM) cells are formed by a pair of cross-coupled inverters asshown in fig. 2. The value at node A represents the logical value of the data held bythe cell. The value at node A isinverted at node B. Node B is fed-back and inverted, maintaining thelogic value at node A. Of the fourtransistors comprising the invertors, two transistors are always on,sinking current. The data value is written to or read from the cell via two accesstransistors. Enabling the transistors via the select line transfers the voltage levels atnodes A and B down to the sense amplifiers and then to the output pins.

A dual port static RAM (DPSRAM) cell has the same cross-coupled inverterstructure. There is an additional set of access transistors as shown in fig. 3. Theupper set is used for writes, the lower set is used for reads. This design allows thememory cell to be written and read at the same time. A DPSRAM chip uses a flag

5


4/21


5/21


6/21

cells, dual port static R AM and dynamic RAM. In addition to the mem ory arrayitself, FIFO structures typically include read and w rite address pointers as well asinput and ou tput buffers and reg isters . Each o f these parts is susceptible to failure.In the next section we will exam ine papers on phy sical fault processes an d develop apractical fault mo del for further study.

Fault M odelOur o bjective in this section is to create a fault mod el to simulate and ana lyze.

The fault model chosen will be composed of the faults that occur during system userather than faults screened out during bu rn-in testing. This requires examining faultsidentified both theoretically and experime ntally and constructing a list of FIFO errorsfrom the identified fault types.

Failure M odesStudy of fau lts starts with the baseline faults experienced on less com plex ICs

(Colbourn e et al . 1974 p. 250). Colbourn e et al . lis t the following p ackage relatedand chip related failure m odes for all integrated circuits:

The assem bly- and package-related failure modes include:1) open bond wires) herm eticity2) l ifted bond s) thermal intermittents3) lifted chipsThe chip-related failure m odes include:1) oxide faults) mech anical defects in chips2) metallization faults) design defects3) diffusion defectsTechnical reports from the Rome Air Development Center (Coit et al. 1984),8


7/21

(R eliability Analysis Center 1989) support these failure modes. Identifying physical failure modes is but a first step in developing the fault model. Failure modes need to be translated into logical faults and parametric faults. Logical faults cause the logic function of the device to be changed to some other value. Parametric faults change the timing, current or voltage levels of a circuit (Breuer and Friedma n 1976 p. 15).

In the process of converting from physical failure m odes to faults, we would want to eliminate failure modes which do not appear during system operation. Environmental stress screening rejects devices with latent defects, to prevent in a

Figure 6. -- The R oller-Coaster Curvemanufacturing defects from becoming service life failures (Tustin 1986). For

9


8/21


9/21


10/21


11/21


12/21

fault types. Quite a few pape rs have chose n fault models as a subset of available faulttypes without experimen tal backup on relative fault frequencies (Cox 197 8), (Ayacheand Diaz 197 9), (Dobbins 1986), (Midkiff and Koe 1989). Though this is the sameprocedure followed in this paper, there is a mo re scientif ic method to constructing afault model for a specific device.

Inductive Fault Analysis"Inductive fault analysis (IFA) is a systematic and automatic method for

determining wh at faults are likely to occur on a sp ecific circuit" (Ferguson and S hen1988). IFA takes into account the m anufacturing process, physical defect rates anddevice layout . The IFA com puter program in jects defects on a de scr ip tion of thephysical layout and sim ulates device operation to determ ine what faults result . Theresearch team of Dekker , Beenker and Thi jssen (1988, 1989, and 1990) have usedinductive fault analysis to construct and grade a m ore effective testing technique.

W hile inductive fault analysis appears to be a superior m ethod for choo sing afault mode l, several restrictions prevented IFA use in this paper. First, one needs thedetailed physical layout and the process defect statistics. Second this technique seem smore applicable in identifying manufacturing defects screened out as part of burn-in.As we learned b ack in the discussion on the roller-coaster curve, some latent defectswill always pass throug h any inspection proce dure. The IFA me thod clearly identifiessome d efects as non-faults (Ferguson and Shen 1988 p. 479). Though Ferguson andShen do n ot identify what these exceptions were, they m ight be prime exam ples of

1 4


13/21


14/21

Pattern Sensitive Faults in the Moore neighborhood (see fig. 8b.):Only static faults where an individual cell modifies another individualcel l are m odeled.

Intermittent FaultsUnidirectional burst faults from alpha particles hitting one or more cells.Design timing defects and read/write pulse noise causing erroneous pointeroperation.

Chapter III explains how the simulation implements each of these fault types.

Error R eduction TechniquesHaving learned about failures in memory based FIFOs, it is time to examine

what techniques are used for fault detection and error reduction.

R AM ArchitecturesThere are several architecture approaches to RAM error reduction. Each

approach is aimed at a different level of the memory system. There is nothing thatprecludes one or all of the approaches from being used simultaneously.

Memory cell techniques. For the case of alpha particle errors, design of theindividual cell can have a dramatic impact on the error rate experienced (Takeuchi etal. 1989 p. 1644), (Takeda et al. 1989 p. 2567). An obvious error reductiontechnique is to redesign the memory cell. Minami et al. (1989) proposed a changein architecture from lateral to vertical memory cells, shielding critical areas from the

16


15/21

alpha particles. Since chip packaging is the primary source of the alpha particles,different packages and bather films on the chip have been tried (Sarrazin and Malek1984 p . 53).

On-chip techniques. On e metho d of deal ing with m anufactur ing defects is toadd spare decoders and spare word and bit lines (Schuster 1978 p. 698). Schusterexplains that when faults are encountered, the defective column or row can beswitched out for a spare. The changes can be latched in temporarily, burned in witha laser, or programmed using electrically programmable read only memory cells(EPROM). Chang, Fuchs, and Patel (1989) extend this concept to show how couplingfaults can be diagnosed and then repaired by switching rows or columns.

On-chip error correction is another error reduction method. Yamada et al.(1984) im plemented a bidirec tional par ity code wh ich com putes hor izonta l and ver t ica lparities on a 4 bit cell group and is capable of correcting single bit errors. Fuja,Heegard, and Goodman (1988) provide the theory behind this technique as theyexplain single, double, and triple-error correcting linear sum codes.

System solutions. There are several error correcting codes useful at thesystem level to reduce errors. Bossen and Hsiao (1980) describe use of a single-error-correcting and double-error-detecting code with a hardware algorithm that clears softerrors in the presence of hard errors by remembering hard error locations.Grosspietsch (1988) describes a VLSI chip that remembers the location of hard faultsand dynamically switches to spare bit-slices of the memory system.

17


16/21


17/21


18/21

Stroud and Sridhar report diagnostic runtimes reduced by three orders o f mag nitudecom pared to external test methods.

One problem with parallel s ignature a nalyzers is that the test is invalid if aredundan t bit l ine is substituted for a defective bit l ine (Sridhar 1986 p. 20). Krauset al. (1989 ) propose a mo dification to the test architecture to deal with this problem .During test ing the faulty bit l ine is disconnected from the paralle l comparator. Thusthe indeterminate state of the cells in the bad column w ill not corrupt the signature.

Saluja et a l. (1987) compare the B IST hardware overheads for random logicand m icrocoded RO M. Though m icrocoded RO M requires more si l icon area, a t chipsizes above 64K the difference between the two method s in percentage of the chiparea used is negligible. Advantages of the microcoded ROM method include shorterdesign cycle and increased testabi li ty of the B IST hardware.

Franklin, Saluja, and Kinoshita (1989) use the fact that the increased speed o fbuilt-in self testing can m ake longer test sequences practical. If the tester is l imitedto 0(n) steps, then only pattern sensitive fault models which consider the imm ediateneighborhood are practical. Franklin et al. can test for pattern sensitive faults whichdepend o n the weight of the cell contents in the base cells row and colum n neighbor-hood . The speed of the built-in test circuit and the econom y of the test equipm entm ake their 0(n' 5 )algorithm practical.

Regen er (1988) proposes an on-chip sequence generator which cycles throughall possible bit patterns for a n bit RAM . Their generator circuit is fast and sma ll ,but any test pattern wh ich cycles throug h all transitions will have n2 n steps. Even the

20


19/21

most efficient circuit will quickly become impractical with increasing RAM densities.For example, at 1 ns step time, Regener's algorithm will take 10" years to test a 1KRAM. Perhaps his method of generating test sequences might be successfully appliedto some of the optimal 0(n k )est patterns.

Fujiwara et al. (1988) survey RAM built-in self test techniques in Japan. Ofparticular note is the work by Miura, Tamamoto, and Narita (1987). They consideraddress faults where a given address accesses an incorrect location and the neglectedlocation is not picked up by some other address. Their 0(kn ) . 5 ) sequence de tects allof these address decoder and local neighborhood, pattern sensitive faults.

A further method to reduce testing time applicable to any of the above BISTtechniques is to partition the RAM of size M into Q nodes of size N. Using para l le ltest techniques on each module, a test pattern which requires O(M k ) opera t ions wil lnow only require O(Nk ) operations.

A great deal of the literature on built-in self test is aimed at the part screeningprocess. For non-interruptive built-in test we will examine error correcting codes.The next section will address several codes that are intriguing because they aretargeted toward the types of faults seen in R AMs .

Error Correcting Codes for R AM S

In this thesis we use only one error correcting code, the modified Hammingcode proposed by Hsiao. There are alternative codes which could provide betterresults than the Hsiao code. For example, Davydor and Tombak (1991 p. 897)

21


20/21


21/21

which can be detected. The t-EC/d-UED codes use fewer check bits than the t-EC/AUED codes .

Because of the bursty, asymmetric nature of alpha particle errors, thet-EC/AUED and t-EC/d-UED codes provide an interesting alternative to the modifiedHamming SEC-DED code.

23

McEnroe Thesis CH2 Review of Literature

Documents

Transcript of McEnroe Thesis CH2 Review of Literature