At-speed scan insertion and automatic test pattern...

AT-SPEED SCAN INSERTION AND AUTOMATIC TEST PATTERN GENERATION

OF INTEGRATED CIRCUITS WITH FAULT-GRADING AND SPEED-GRADING

Joseph Fang

B.A.Sc. University of British Columbia, 2000

PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ENGINEERING

In the School of

Engineering Science

O Joseph Fang 2005

SIMON FRASER UNIVERSITY

Summer 2005

All rights reserved. This work may not be reproduced in whole or in part, by photocopy

or other means, without permission of the author

APPROVAL

Name:

Degree:

Title of Project:

Joseph Weizhou Fang

Master of Engineering

At-Speed Scan Insertion And Automatic Test Pattern Generation Of Integrated Circuits With Fault-Grading And Speed-Grading

Examining Committee:

Dr. Mirza Faisal Beg Committee Chair Assistant Professor, Engineering Science

Dr. Karim S. Karim Senior Supervisor Assistant Professor, Engineering Science

Dr. Karim Arabi Supervisor Manager, Design Automation, PMC-Sierra Inc.

Date DefendedIApproved:

SIMON FRASER UNIVERSITY

PARTIAL COPYRIGHT LICENCE

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection.

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.\

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

W. A. C. Bennett Library Simon Fraser University

Burnaby, BC, Canada

ABSTRACT

With the growing complexity of today's integrated circuit designs, engineers have

abandoned the use of pure functional test vectors wherever possible, and adopted

various DFT solutions to make their designs more test-friendly. The most common DFT

approach for digital designs is scan insertion and automatic test pattern generation

(ATPG). ATPG is performed based on fault models associated with the design or gates

within the design. Traditionally, the most popular model is the stuck-at model. However,

as transistor size continues to shrink, new defect mechanisms start to appear that affect

the speed of the design, and so can no longer be properly modelled by this model.

Consequently, a new fault model called transition-delay fault models is created to allow

ATPG to detect at-speed defects. Another model called path-delay fault model is also

created for speed-gradinglbinning and I10 timing characterization on scan-inserted

designs.

As part of an ongoing DFT development for PMC-Sierra Inc., a suite of

automation flows have been implemented to perform AC-Scan ATPG. This includes

transition-delay ATPG with DC top-up ATPG for delay defect detection, path-delay

ATPG for speed-gradinglbinning and I10 timing characterization, and AC-scan ATPG for

RAM interfaces with multi-load algorithm.

DEDICATION

To my parents, my wife and my unborn child.

ACKNOWLEDGEMENTS

I would like to thank Dr. Karim S. Karim for his supervision during the progress of

this project. His willingness to take on the supervisory duties and meaningful advices

during the course of this project is most appreciated. I would also like to thank Dr. Karim

Arabi for his constant supervision and mentoring during the progression of this project

and the past four years. I would not have been able to become a qualified and

successful DFT engineer without his guidance.

TABLE OF CONTENTS

. . Approval ........................................................................................................................ 11

... Abstract ........................................................................................................................ III Dedication ..................................................................................................................... iv

Acknowledgements ....................................................................................................... v

Table of Contents ........................................................................................................ vi ...

List of Figures ............................................................................................................ VIII

List of tables ............................................................................................................ ix

List of Abbreviations ..................................................................................................... x

Chapter 1 1 . 1 1.2

1.2.1 1.2.2

1.3

Chapter 2 2.1

2.1 . 1 2.1.2 2.1.3

2.2

Chapter 3 3.1 3.2

3.2.1 3.2.2 3.2.3 3.2.4 3.2.5

3.3 3.4

3.4.1 3.4.2 3.4.3

INTRODUCTION AND BACKGROUND .................................................. 1 General Overview of Integrated Circuit Testing ............................................ 1 Automated Testing: Scan Insertion and Automatic Test Pattern

................................................................................................... Generation 3 .............................................................................................. Scan Insertion 4

Automatic Test Pattern Generation (ATPG) ................................................. 5 Need for At-Speed Testing .......................................................................... 7

PROJECT DESCRIPTION ....................................................................... 9 Project and Report Overview ....................................................................... 9 Develop AC-Scan ATPG flow for physical defect detection .......................... 9 Develop AC-Scan ATPG flow to qualify operating speed of devices .......... 10 Develop AC-Scan ATPG flow for at-speed test of RAM interface ............... 10 Project significance .................................................................................... 10

................................................................. TRANSITION-DELAY ATPG 12 ......................................................... Transition-Delay ATPG Methodology 12

................................................... Transition-Delay ATPG Flow Description 14 Generate all files to automate flow ............................................................. 15 Identify and group functional clock domains ........................................ 16 Performing ATPG in AC transition-delay mode .......................................... 18

........................................................... Fault grading in AC and DC modes 18 Perform ATPG in DC (stuck-at) mode for coverage top-up ........................ 19

............................................................................................... Benchmarks 19 ............................................................ Scan Strategy for AC-Mode ATPG 20

........................................................................ Pin sharing for scan clocks 20 Testability with block-level ATPG ............................................................... 21

............................................................................... Launch-on-shift ATPG 23

Chapter 4 4.1 4.2

4.2.1 4.2.2 4.2.3 4.2.4

4.3 4.3.1 4.3.2 4.3.3

4.4 4.4.1 4.4.2 4.4.3

Chapter 5 5.1

5.1.1 5.1.2

5.2 5.2.1 5.2.2 5.2.3

5.3

Chapter 6

Chapter 7

PATH-DELAY ATPG ............................................................................. 25 Path-Delay ATPG Methodology for Speed-GradingIBinning ...................... 25 Path-Delay ATPG Methodology for Input~Output Characterization ............. 28 Input setuplhold characterization ............................................................... 29 Output data propagation characterization .................................................. 31 Output tristate characterization: 1 C + Z .................................................... 32 Output tristate characterization: 0 C +Z .................................................... 34 Path-Delay ATPG Flow Description ........................................................... 35 Generate all files to automate flow ............................................................. 36 Performing ATPG in AC path-delay mode ................................................. 37 Pattern post-processing ............................................................................. 38 AC Path-delay ATPG Issues ...................................................................... 38 Acceptable test coverage ........................................................................... 39 Clock source for I10 characterization ......................................................... 40 Path Selection Issues Related to Speed-GradingIBinning .......................... 40

AT-SPEED ATPG ON RAM INTERFACE .............................................. 43 Overview of RAM Interface ........................................................................ 43 MEB: Active-Low Module Enable .............................................................. 44 OEB: Active-low Output Enable ................................................................ 45 At-Speed ATPG Methodology for RAM Interface ....................................... 45 Testing DIN of RAM ................................................................................... 45 Testing ADDR of RAM ............................................................................... 47 Testing DOUT of RAM ............................................................................... 49 RAM ATPG Issue(s) .................................................................................. 52

TEST-CASE RESULTS ......................................................................... 53

CONCLUSION .................................................................................... 56

................................................................................................................ Bibliography 59

LIST OF FIGURES

Figure 1 : Scan insertion through replacement and stitching of flip-flops ....................... 4

Figure 2: AC-Scan Transition-delay ATPG example .................................................. 13

Figure 3: AC-Scan Transition-delay ATPG flow chart ................................................ 15

........................................................................ Figure 4: Clock sharing in scan mode 21

....................................................... Figure 5: Launch-on-capture VS . launch-on-shift 23

........................................................... Figure 6: AC-Scan path-delay ATPG example 27

Figure 7: AC-Scan input setuplhold characterization ................................................. 30

................................... Figure 8: AC-Scan output data-propagataion characterization 32

.................................................... Figure 9: Output tristate characterization (1 and Z) 33

Figure 10: AC-Scan output tristate characterization (0 and Z) ...................................... 34

Figure 11 : AC-Scan path-delay ATPG flow-chart ......................................................... 36

Figure 12: Simplified diagram of RAM interface ........................................................... 44

Figure 13: Testing DIN of RAM .................................................................................... 46

Figure 14: Testing ADDR of RAM ................................................................................ 48

Figure 15: Testing DOUT of RAM ................................................................................ 50

Figure 16: AC Transition-delay coverage VS . pattern count ........................................ 55

LIST OF TABLES

Table 1 : Test-case results for AC-scan Transition-delay ATPG .................................... 53

Table 2: Test-case results for AC-scan Path-delay ATPG ............................................ 55

LIST OF ABBREVIATIONS

ATE ATPG BlST DFT DLL DUT EDA 110 IC PI PLL PO RAM SE s o c STA

Automated Test Equipment Automatic Test Pattern Generation Built-in Self Test Design for Testability Delay Lock Loop Device under Test Electronic Design Automation InpuVOutput Integrated Circuit Primary Input Phase lock Loop Primary Output Random Access Memory Scan-Enable System-on-a-Chip Static Timing Analysis

CHAPTER 1 INTRODUCTION AND BACKGROUND

1.1 General Overview of lntegrated Circuit Testing

A majority of today's digital electronic components consists of collections of

Silicon-based lntegrated Circuits (ICs). Generally speaking, each piece of IC is

designed to perform certain digital or logical functions at a particular speed. However,

manufacturing process variations and/or contaminations during IC fabrication may cause

the actual silicon to deviate from the targeted performance or malfunction altogether.

Hence, it is important that every fabricated piece of IC be thoroughly tested to guarantee

its quality. This is the driving force behind a variety of IC test approaches.

Similar to testing of any system, the basic idea of digital IC testing is to apply

stimuli to the device under test (DUT), and compare the actual output of the DUT with

the expected response. The collection of input stimuli and associated expected output

responses are called test vectors. More specifically, in a set of test vectors, different

combinations and sequences of logic zeroes and ones are applied onto the primary input

(PI) ports of the DUT. These sequences are meant to sensitize portions of the DUT's

internal circuitry, causing it to change the logic states of related primary output (PO)

ports. The vectors contain the expected response of a working DUT based on the input

stimuli, and compares the actual output logic values with the expected values. A

mismatch between the expected and actual circuit behaviour would indicate that a faulty

DUT has been detected and that it is to be discarded.

To measure the effectiveness of a set of test vectors, the notion of test coverage

needs to be introduced. Test coverage is a measurement of the percentage of the

circuit's total gates that is properly tested with the test vectors. Ideally, this needs to be

100% such that every gate in the design is tested to guarantee that the DUT is validated

in its entirety. However, this is not always a realistic goal. In general, 95% is considered

acceptable for ICs implemented in older technologies, and 97%-99% is needed for ICs in

more current technologies.

A piece of IC is tested by being placed on an automated test equipment (ATE),

also called production tester.[l] The I10 ports of the IC are connected to tester channels

of the tester. Furthermore, test vectors are stored in the tester, which applies the stimuli

to the IC's input ports and compares the output response from the IC's output ports

through the tester channels. There is only a finite amount of storage space in an ATE,

meaning that the desired test coverage must be achieved by a finite number of test

vectors. This number, defined as vector count, must be less than the physical memory

limitation of the tester to be fully loaded into it. In a way, vector count is a benchmark of

the vector efficiency. Older testers can store up to eight million test vectors while newer

ones have memory upper bounds at 32 million vectors or higher. Aside from avoiding

reaching the tester memory limit, it is always good practice to keep the vector count low

due to the high cost of tester time.

Before the advent of any structured testing approaches, test vectors are often in

the form of manually written functional testbenches. These testbenches are designed to

sensitize as many gates in the design as possible through functional operations of the

device. They are usually derived from simulation vectors written as part of design

verification to confirm behavioural specification of the design. [I] This kind of test

approach is often inefficient and ineffective due to the difficulties to cover all corner

cases of the design with limited availability of vector space and manpower, especially for

today's multi-million gate designs. A large part of the inadequacies of these manually

written vectors, also called functional test vectors, is due to the existence of sequential

elements such as flip-flops and latches which create depth in the design: In an edge-

triggered design, in order to test a gate embedded in the core of a block of digital circuit,

the input stimuli must traverse through multiple layers of flip-flops to sensitize the gate,

and similarly the output of the gate must travel through multiple layers of flip-flops before

arriving at a primary output to be observed. As designs increase in complexity and

levels of sequential elements grow, vector count increases exponentially due to the

limited number of 110s available to traverse the depth of the core logic. It is clear that IC

testing will become unmanageable without a more structured as well as more automated

approach.

1.2 Automated Testing: Scan Insertion and Automatic Test Pattern Generation

As ICs become increasingly more difficult to test through conventional functional

vectors, IC engineers start to design test structures into digital circuits to help ease the

burden of testing. This concept of making circuits more testable is called Design-for-

Testability (DFT). As mentioned, the major contributor to the complexity of the functional

test vectors is the existence of flip-flops and the stages of pipelines they create.

However, this issue can be resolved if the flip-flops can be made directly controllable

and observable. This idea is realized through scan-insertion, a conceptually simple and

highly automated DFT implementation method, and one of the most useful solutions

offered by DFT to make digital IC testing manageable. In today's IC design industry, a

large part of quality assurance of digital ICs is based on running scan test vectors

through a scan-inserted (also called scan-stitched) design. [ I ] From the view of a

designer, this involves two main steps: scan insertion, and automatic test pattern

generation (ATPG).

1.2.1 Scan Insertion

Figure 1: Scan insertion through replacement and stitching of flip-flops

DIAGRAM A: pre-scanned gate-level description

scan h P qqjqT

post-scanned

port

Assume all CP pins of all flops are toggled by the same clock source

SE diagram B is the scan-inserted version of

~~nnnn r u m diagram A.

diagram B contains a single scan chain shift capture shift stitching together all 5 flops (see gray line)

SE=O for diagram B to operate identically to diagram A

The following is a description of scan insertion: After a design has been

synthesized down to a gate-level description (also called a netlist), an electronic-design-

automation (EDA) tool such as DFTAdvisor from Mentor Graphics or DFTCompiler from

Synopsys Inc. is used to link together all edge-triggered sequential elements to form

long chains of shift registers called scan chains. [2] This process requires the

introduction of a DFT-specific input pin called scan-enable (SE). [ I ] Usually, an SE-

controlled Multiplexer is inserted onto the data input of every D-flop such that the

functional input is routed into the flop when SE is assigned '0' and another scan-mode

data input, normally called scan-in, is routed in when SE is assigned '1 '. The scan-chain

is constructed by connecting the output of one flop to the scan-in of the next flop,

forming a long shift-register when SE equals '1 '. The scan-in of the first flop in the chain

is driven by a device-level scan-in port; and the output of the last flop in the chain drives

a device-level scan-out port. SE is held at '0' while the device is operating in functional

mode so that functional data paths are carrying data across. Above is a diagram

depicting scan insertion.

1.2.2 Automatic Test Pattern Generation (ATPG)

With the scan chains constructed, the flops are transformed from deeply

embedded internal nodes of the circuits into control and observe points for testing. This

is accomplished by asserting SE to '1' and pumping data onto each flop in the scan

chain(s), with continuous clock pulses, from the device-level scan-in port(s). The data

shifted into the flops (also called scan cells) are the input stimuli that launch into the

functional paths from the flops' outputs and arrive at the data-input of other scan cells,

sensitizing combinational gates along the way. Then, SE is de-asserted to break the

scan chains and connect the functional paths, and clock signals are pulsed to capture

the data traversed through the functional paths onto the scan cells. Finally, the same

shift operation is done to pump the captured data out through the scan-out port. The

data shifted out make up the device's response to the input stimuli from the shift-in

operation and are compared for mismatches. Because of the structured nature of scan-

testing, today's EDA tools, such as FastScan from Mentor Graphics or TetraMAX from

Synopsys Inc., make use of this shift-and-capture sequence to automatically create test

patternslvectors to validate the circuits efficiently, usually achieving 95% test coverage

or higher at reasonable vectorlpattern count. Because of its consistent delivery of

results, the approach of using EDA tools to automatically generate scan vectors is the

mainstream method of creating test vectors today.

At this point, it is worthwhile to define some of the commonly used terms for

ATPG: Each clock pulse while SE equals '1' forms a shift vector; Each clock pulse

while SE equals '0' is called a capture vector or capture cycle; The entire sequence of

continuous shift operations that fully load andlor unload the scan chains in the design is

called a shift sequence or loadlunload operation; A shift-sequence and the capture

vector(s) immediately trailing it together form a test pattern or scan pattern.

In order to perform test patternlvector generation, the ATPG tools use fault

models to describe the behaviour of physical defects. Traditionally, the most commonly

used model is called the stuck-at model, which translates all physical defects of digital

circuits into particular nodes or pins of digital gates being unable to make transitions and

so permanently "stuck" at a particular logic level. [3] Since the model assumes that

faulty gates are permanently connected to VDD or VSS, scan-testing can be done at

very low speed as the stuck-at values are always present without regards to clock

frequency. This model has been one of the best tools for testing IC, but its usefulness is

diminishing as IC technology makes its advances into 0.1 3um and 90nm.

1.3 Need for At-Speed Testing

As advancements are made to design, development and fabrication of ICs, chips

are made to run faster and faster and are designed to contain more and more complex

functionalities. [4] This drive for faster performance and System-On-a-Chip (SoC)

design structure pushes the boundary of IC fabrication, reducing the transistor size.

From a test perspective, the ever-shrinking transistor in deep-submicron technology has

caused the emergence of new defect mechanisms such as resistive via. Faulty gates

with this type of defect mechanism exhibit the behaviour of very slow data transitions. [5]

In other words, the gate still correctly makes the transition, just not at a high enough

speed. To detect this type of faults, a new DFT strategy is needed to test the device at

its functional operating frequency. [6] One has been derived from the conventional

scanIATPG approach: at-speed scanIATPG, also called AC-scanIATPG. To distinguish

itself from AC-Scan, the conventional scanIATPG methodology with stuck-at fault model

is also called DC-ScanIATPG.

Clearly, this new type of defects cannot be modelled by "permanently stuck-at"

logic levels, and so requires a set of new "temporarily stuck-at" fault models that

accomplish scan-testing at or close to the device's functional frequency of operation.

There are two fault models introduced: transition-delay model, and path-delay model. [7]

The current industry trend is to use these models to augment stuck-at models.

However, they soon will become the dominant fault models.

With Transition-delay fault model, a digital gate is modelled such that a transition

(0-to-1 or 1 -to-0) on its inputs must result in a corresponding transition on the output. [7]

The ATPG tool running with this type of modelling creates test patterns to launch a

transition (rather than a constant value as in the case of DC-scan) into the targeted input

of the gate, then captures the corresponding post-transitioned value of the gate's output

onto a scan-cell, and finally shifts out the scan-cell value to verify the transition. Path-

delay fault model is similar to transition-delay model. The difference is that, with path-

delay model, the transition is defined on an entire path (i.e. from source flop to

destination flop through all combinational gates in between). This fault model is used to

characterize an entire path. It is ideal for speed-binning or speed-grading in which the

critical paths are selected to be sensitized by the ATPG tool. The tool then tests if a

transition on the source flop can result in a corresponding transition on the destination

flop within the specified speed. A failure does not necessarily mean physical defect, but

rather than the "faulty" chip may belong to a lower grade of ICs and sold as a less

expensive part.

CHAPTER 2 PROJECT DESCRIPTION

2.1 Project and Report Overview

The project to be described by this report is on the implementation of an industry-

standard AC-Scan and ATPG automation flow to enable at-speed test for PMC-Sierra

Inc.. PMC-Sierra is a "fabless" semiconductor design house. Its core technology

revolves around developing ICs that enables transmission, processing and storage of

Internet data, as well as general-purpose microprocessors. The project enables at-

speed testing of ICs using existing DFT structures contained within the ICs, including

scan-flops and various clock gating architectures to provide clock controllability during

testing. This project is further broken down into the following components each of which

will be discussed in detail in a later chapter of this report:

2.1.1 Develop AC-Scan ATPG flow for physical defect detection

This component requires the implementation of a push-button automation flow to

automatically create test patterns with the transition-delay fault model. It is described in

Chapter 3. From the perspective of this report, issues related to design partitioning in

the context of AC-scan is of more interest than the actual coding of scripts to run the

ATPG tool. Therefore, the discussion will focus on the design intrusions, automation

flow and pitfalls.

2.1.2 Develop AC-Scan ATPG flow to qualify operating speed of devices

This component is discussed in Chapter 4. There are two sub-components in the

discussion both of which revolve around ATPG with path-delay fault models: One is the

normal ATPG of internal (flop-to-flop) data paths for speed-gradinglbinning. The other is

AC-Scan ATPG on paths connected fromlto device-level primary 110s to verify their

timing specifications on silicon (also called 110 characterization). There are some design

related issues here such as access of delay-locked loop (DLL) in AC-Scan mode that

are also explored. Also, the choice of data-paths for speed-grading is discussed.

2.1.3 Develop AC-Scan ATPG flow for at-speed test of RAM interface

Neither of the two components above supports at-speeding testing of RAM

interfaces due to the complexity of the ATPG algorithm when testing RAMs. Chapter 5

describes an ATPG flow based on AC-Scan that specifically targets data and address

buses of RAMs. More specifically, the ATPG algorithm of the ATPG tool itself is

described, and some design-related issues are also discussed.

2.2 Project significance

This project bares a high degree of significance to PMC-Sierra. In a cost-driven

business environment, it is important to keep every working part to maintain good profit

margin for the company, and discard every defective part to ensure customer

confidence. This means there is to be no or at least next-to-none test-escapes and

false-rejects. This need is further magnified by the lower yields due to immaturity of the

newer technologies. Lower yield increases the likelihood of faulty parts and therefore

the opportunity for test-escapes, hence placing a tougher constraint on test coverage

which is directly proportional to vector count. Even though the financial benefits of

testable designs are not easy to quantify [8], the use of structured tests will shorten the

design cycle and result in overall savings to the company. [9] The generation and use of

test patterns is a major contributor to the overall cost of IC development. ATPG itself is

a non-recurring cost that only applies during the design phase. [9] This includes cost of

human engineering resource, license cost for the usage and maintenance of ATPG

tools, computer resources. The more major cost here is the actual test time of each

fabricated silicon as it is a recurring cost that applies to every piece of IC to be shipped

to customer or discarded due to detected defects. [9] Therefore, it is vital that a straight-

forward ATPG flow is put in place to ensure efficient vector count at acceptable test

coverage.

CHAPTER 3 TRANSITION-DELAY ATPG

3.1 Transition-Delay ATPG Methodology

As mentioned, Transition-delay fault model is used to detect at-speed

manufacturing defects on silicon. With this fault model, a digital gate is modelled such

that a transition (0-to-1 or 1 -to-0) on its inputs must result in a corresponding transition

on the output. The ATPG tool is designed to understand the model, and creates two

consecutive clock pulses at fast clock frequency to make up the capture cycles, a

procedure called double-pulse. [I 01 Using shift-in operation, it first presets the input of

the gates with the pre-transition value in preparation for the double-pulse, then launches

the post-transition value into the gate from a scan cell with the first clock pulse to cause

a transition on the gate's output, and tries to capture the final value on the gate's output

with the second pulse onto a scan cell connected to the output of the gate. Finally,

contents of the scan chains are shifted out for comparison. Please note that the model

itself does not specify the propagation speed of the input transition to the output, but

rather just specifies the values before and after the transition. As long as a transition on

the gate of interest is properly launched and captured through the double-pulse of the

clock, the fault is considered to be detected at-speed. This way, it is up to the designer

to choose a clock frequency for the double-pulse and the tool to implement it. The

following diagram provides an example of transition-delay ATPG and the method of

detecting at-speed faults using double-pulse of the clock.

Figure 2: AC-Scan Transition-delay ATPG example

flop A (value on Q) flop D (value on Q) after shill 1 after shill 0 after launch pulse don't care after launch pulse 1 after capture pulse don't care after capture pulse don't care

scan out port I=>

flop E (value on Q) after sh~ft don't care after launch pulse 0 after capture pulse 1

flop B (value on Q) flop C (value on Q) after sh~ft 1 after shlft 1 after launch pulse don't care after launch pulse 1 after capture pulse don't care after capture pulse don't care

The diagram above shows five scan flops linked together into one single scan

chain. SE pin of all flops are connected together, as with CP pins. The 0+1 transition

on the input of the AND gate connected to Q of flop D is targeted transition-delay ATPG.

The basic idea here is that the pattern must launch a 040-1 transition on the targeted

input of the AND gate; all other inputs to the AND gate must stay transparent to allow

the targeted transition through the gate; finally the destination flop must capture the post-

transitioned value so that it can be shifted out. Here is how the test is accomplished in

detail: A '0' is shifted into flop D to drive the targeted input of the AND to '0'; A '1' is

shifted into flop C such that the OR gate drives a '1 ' into the non-targeted input of the

AND, hence allow the preset value '0' to be observed by flop E. Then SE (also called

scan-en) drops to '0' to activate functional paths. The launch clock pulse launches the

'1' shifted into flop A across flop D to start the propagation. This pulse also launches the

'1 ' shifted into flop 6 across flop C to maintain the '1 ' on the OR gate. The capture

pulse, which follows immediately after the launch pulse to form the at-speed clock

period, captures the '1' through the AND gate from flop D onto flop E. Then the shift-out

operation will pipe the data on flop E out to the scan-out port for observation. In the

presence of an at-speed defect on the targeted input of the AND gate, the transition

launched from flop A would have been too slow to register the '1' on flop E, causing a

mismatch.

3.2 Transition-Delay ATPG Flow Description

With the concept of AC-Scan transition-delay ATPG understood, an automation

flow is created to generate test transition-delay patterns on designs requiring at-speed

defect detection. Generally speaking, the flow specifies design-specific double-pulse

sequences and relies on the ATPG tool to generate patterns accordingly, but there are

more details involved. The diagram below is a flow chart of AC-Scan ATPG flow for

Transition-Delay model. It is broken down into 6 stages each of which is described more

closely in the paragraphs following:

Figure 3: AC-Scan Transition-delay ATPG flow chart

input design-info file generate all li/es automate flow with gray color

I

.c Trans-dly ATPG

at~-transdly*.dof I

(multiple sessions, one for each domain

1 . + 1

group) I WGL att tern. (one ~ e r

script: identify flops I D func. clock domains

Fault grading

I in AC and DC modes

atpg-faultgrade.log: contams final coverage % for AC pattern

1 I

xx-*.dof: script to X-out flops not in specified domain. (one file per non-conflicting domain grp)

C topup. done

-

Create target fault list for DC top-up

faultgrade-dc-*.fault: DC stuck-at faults not covered by AC patterns subjected to DC mode fault gradtng.

t atpg-dc-topup.log: contains

done

3.2.1 Generate all files to automate flow

This is stage one of the flow. Since the flow is automated and generic across all

designs, user only needs to enter information needed by the flow in a design-info file.

This file is fed into a script that generates all files necessary to accomplish downstream

tasks. There are several advantages to this type of design methodology:

Automated design flow shortens development time and increases efficiency, allowing

designers to concentrate on tasks specific to the design.

A single data file at the start of the flow creates a clear point of intrusion, making

debugging easier.

Running through ATPG requires expertise in the area of DFT. The flow is designed

to provide a higher level of abstraction to allow designers without relevant knowledge

andlor experience to still accomplish their jobs. In essence, automation flows are

designed to shield designers from the intricate details of the tools contained within.

This creates more clear-cut knowledge boundaries and better technical organization.

3.2.2 Identify and group functional clock domains

This is stage two of the flow. In this stage, static timing analysis tool is used to

classify different branches of the scan-mode clock source into functional clock domains.

A piece of integrated circuit can be partitioned according to the clock source of a

collection of sequential elements. A group of flops driven by the same clock source in

normal functionalloperational mode, along with the combinational logic surrounding the

flops, forms a single partition, called a functional clock domain. Each functional clock

domain toggles at its own specified speed according to design requirements. Therefore,

to perform at-speed test, it is necessary to be able to exercise each domain at its own

clock frequency. This presents a unique challenge in scan-mode as, due to pin limitation

and characteristics of clocking strategy, several functional domains may be driven from

the same scan-mode clock sourcelpin, making it hard to target only a particular portion

that corresponds to a functional domain. To be more specific, if a scan-mode clock pin

toggles two functional clock domains, one at 77MHz, the other at 31 1 MHz, testing the

31 1 MHz portion of the flop collection at-speed would result in massive false failures in

the 77MHz portion because data paths formed by the flops in the 77MHz domain are not

designed to run at any higher speed. Alternatively, testing the entire collection of flops at

77MHz would mean that the 31 1 MHz portion is not stressed enough to present realistic

test coverage.

Therefore, this stage of the flow tries to partition and cleverly group the different

functional domains within the scan-mode clock source(s), and then builds the ATPG

invocation files to help target only a single group of non-conflicting functional domains for

each ATPG run coming up later. Each invocation file targets a particular domain group

by masking out all flops not in the targeted group.

The grouping of functional domains depends on pin equivalency of the scan-

mode clock sources/pins. The default behaviour is that domains can only be grouped

together within a single ATPG run if they are all driven from different scan-mode clock

pins and none of the clock pins are pin-equivalent (i.e. none are forced to toggle at the

same time during capture mode of scan operation). The following example illustrate this

idea better:

A block of circuits has five functional clock domains (A, B, C, D and E ) and three

scan mode clock pins (clkl, clk2 and ecbi-wrb), and following configurations:

Clkl controls functional domain A during scan testing.

Clk2 controls functional domains B, C during scan testing.

Ecbi-wrb controls functional domain D, E during scan testing.

Clkl and clk2 are pin-equivalent (i.e. they are to toggle in identical fashion).

Therefore, according to the rules above, here are the clock groups:

group 1 : toggle clkl and ecbi-wrb to target domains A and D respectively.

group 2: toggle clk2 and ecbi-wrb to target domains B and E respectively.

group 3: toggle clk2 to target domain C

Each group will become an ATPG run in the next stage.

3.2.3 Performing ATPG in AC transition-delay mode

This is stage three of the flow. In this stage, ATPG is run for each group of

functional clock domains using the generated files from stages one and two. If run

successfully, each ATPG will result in the generation of a pattern file containing AC-scan

transition-delay patterns targeting a specify group of clock domains.

3.2.4 Fault grading in AC and DC modes

In this stage, the patterns generated from the previous stage is fed back into the

ATPG tool to find the total coverage offered by the entire collection of patterns. This is a

process known as fault-grading. Even though the intended targets of all ATPG runs are

mutually exclusive, some tested faults, such as the faults in the scan chains, are still

accounted for in more than one run. Therefore, fault-grading is not as simple as

summing together the coverage number of all runs.

Fault-grading is performed with AC transition-delay fault model to produce the

final AC coverage percentage, as well as with DC stuck-at fault model to produce the

achieved DC coverage embedded in the AC pattern. [7] Also, a list of DC-mode non-

testable faults is written out in preparation for DC (stuck-at) pattern top-up.

3.2.5 Perform ATPG in DC (stuck-at) mode for coverage top-up

Stage four of the flow created coverage percentage of the AC patterns in DC

mode as a side benefit. At the same time, a list of non-testable faults is in DC mode

generated. In stages five and six, this list is collected to form the target fault list for DC-

mode pattern top-up. The purpose of this top-up is to save tester memory and test time

by avoiding generating full DC patterns. In stage six, ATPG is invoked in DC mode

using the generated invocation files from stage one and the fault list from stage five.

The run should result in the generation of a DC pattern file targeting the fault list. The

final total DC coverage is recorded in the log file. This completes the flow.

3.3 Benchmarks

Typical AC coverage for a block with 95% DC coverage is between 60% to 85%.

AC patternlvector count is usually between three to five times of the DC counterpart.

Here are some of the reasons for the lower coverage in AC when compared to DC:

Reset lines are not tested. This is acceptable because reset typically only requires

DC coverage.

Cross-clock paths are not tested. Since ATPG targets each functional clock domain,

data paths traversing through the domains are not tested. This is usually an

acceptable limiting factor in coverage target as well, because cross-clock paths are

usually false-paths and not qualified at a particular clock frequency.

Block boundaries are hard to test due to the double-capture scheme. This

characteristic will be explained further in detail in the next section.

4. Transition-delay model is a more complex model than the stuck-at model because it

involves setting up a targeted node with two opposing logic values to test for any one

fault, whereas stuck-at model only requires a single value to be propagated onto the

node. This added complexity inherently results in negative impact on the

performance of the ATPG tools and therefore pattern generation efficiency and

effectiveness. [7]

5. This flow does not test RAM interfaces. RAM testing requires a more elaborate

ATPG algorithm called multi-load ATPG. In multi-load ATPG, data is written into

selected locations of the RAM in at-speed fashion in one pattern, then is retrieved in

at-speed fashion from the same locations in a later pattern. This requires the ATPG

tool to keep track of the content of RAM at all times. To ensure pattern efficiency on

the logic away from RAM interfaces, RAMS will be tested in a completely separate

AC-scan ATPG run and the coverage gained from there is additive to the results

obtained in this flow.

3.4 Scan Strategy for AC-Mode ATPG

This section introduces some of the scan strategies to enable AC transition-delay

ATPG and DFT in general. More specifically, the necessity of scan clock sharing is

described as well as strategies for coverage increase such as block-level ATPG and

launch-off-shift ATPG.

3.4.1 Pin sharing for scan clocks

As mentioned earlier, a device can be partitioned according to functional clock

domains. Flops inside each functional clock domain are driven by the same clock

source. There are two classes of clock sources, those coming directly from device-level

input ports, and those driven internally by phase-lock-loops (PLLs), clock dividers, or

interrupts which are in the forms of clock lines driven by output of a flop. The lack of

direct top-level pin controllability in the second class of clock sources poses a DFT

problem since a flop without a clock port at device-level cannot be inserted into scan

chains. Therefore, non-timing-critical data ports are selected to multiplex into the clock

path in scan mode and serve as the clock source (see diagram below). This allows the

flops in a clock domain with internal functional clock source to be scanned, hence

increases testability of the device.

Figure 4: Clock sharing in scan mode

device I .

functional path: AND gate to suppress toggle on

I I functional path in scan mode.

aara port- I ,-In\

scanb: c - 1 active-low scan mode 1 pin: 0 = scan mode 1 = functional mode I"'-

3.4.2 Testability with block-level ATPG

Netlists with gate count beyond 1.5 Million are usually too large a target for any

of the currently available ATPG tools. Therefore, to make DFT possible for large

devices, hierarchical DFT is used such that sub-blocks of the device are targeted

individually for scan insertion and ATPG. ATPG creates test vectors which, in essence,

is a testbench containing stimuli to the DUT and expected response from it. In order to

ensure that patterns generated at block level can still be applied at device-level,

functional 110s of the block need to be masked out such that the only useable 110s are

scan-related pins such as scan-in bus, scan-out bus, scan-enable (also referred to as

SE), and scan mode clocks. This type of pin masking is necessary because the only

pins that can be guaranteed by a generic DFT implementation methodology to map one-

to-one from block-level to the device are scan-related pins. All connections of block-

level functional 110s are design-dependent and cannot be used during block-level ATPG

because their top-level wiring is unknown at this stage.

Masking out functional 110 results in a testability hit because inputs will be

launching 'X' into the boundary interface and output cannot be used to observe data

coming out of them. Though, with DFT-friendly partition practices, design can be broken

into block in which the block-level functional 110s are immediately registered by flops on

the boundary of the block. This will minimize the negative impact of testability drop due

to uncontrollable/unobservable functional 110s in DC scan mode since there is minimal

logic on the edge of the block.

However, providing registers for functional 110s alone is not sufficient for

maintaining reasonable AC scan coverage. Because testing circuit at-speed requires

two consecutive clock pulses, the X's on the inputs are launched across the boundary

flops and contaminates logic between the boundary flops and core flops. The impact on

test coverage by this behaviour depends on the amount of logic contaminated by the X's

and can be quite significant. To resolve this issue, the following workaround is

introduced: Scan input boundary flops of the block into a standalone scan chain with a

dedicated scan-enable (SE). For block-level AC-scan ATPG, hold scan-enable of the

input boundary scan chain constantly high such that chain structure is maintained

through the double-pulse of the scan clock(s). This way, value launched into the logic

after an input boundary flop will be data shifted into the previous scan element in the

chain, instead of the 'X' from functional input. Note that since the scan chain is

maintained, scan paths in the chain will be targeted for at-speed test. Therefore it is

important to either specifically exclude the scan paths in ATPG or synthesize the input

boundary scan paths at functional speed.

3.4.3 Launch-on-shift ATPG

So far, the double-pulse clocking scheme discussed is termed as broadside or

launch-on-capture. The name comes from the fact that scan-enable (SE) falls to '0' long

before the start of the launch pulse and stays at '0' through both clock pulses. [7] This

approach best mimics the functional behaviour of the DUT. However, it places strains

on the ATPG tool to back-track through potentially large logic cones to place onto the D

input of flops the correct logic value so that a properly transition can be launched

through the double-pulse. This peculiarity, depending on the capability of the ATPG tool,

can limit test coverage and result in long run time.

Figure 5: Launch-on-capture VS. launch-on-shift

launch-on-capture scan-en I clocking scheme clock

, , trme available for ! uncertainty of scan-en

launch-on-shift clocking scheme m , fl n

The launch-on-shift scheme (as shown in the above diagram) offers a slightly

different way of controlling scan-enable (SE). With this method, scan-enable is held high

through the launch pulse and drops before the capture pulse. Its advantage is that the

launch data becomes the output of the previous scan-element in the scan chain rather

than the output of logical cones in the functional path. This makes ATPG much easier

on the tool because the launch pulse becomes the final shift cycle and the tool can avoid

traversing clouds of combinational logic to derive launch values. The problem with this

approach is the strain put into building a tight clock tree for scan-enable (SE). Because

the double-pulse is issued at-speed, most of the time there is only a few nanoseconds

between the pulses. This is not a lot of time to have scan-enable fully propagated into

the SE pin of every flop in the design. Balancing such a large clock-tree is quite

challenging, and much of the time impossible given the area constraints and routing

congestion. Therefore, launch-on-shift is not widely adopted in the industry.

CHAPTER 4 PATH-DELAY ATPG

4.1 Path-Delay ATPG Methodology for Speed-GradingJBinning

Before diving into the specifics of the speed-gradinglbinning with scan vectors, it

is worthwhile to discuss the specific characteristics of test vectors used for speed-

binninglgrading. A synchronous digital design consists of sequential elements such as

flip-flops and data-paths linking the sequential elements to each other and to device-

level I10 ports. For the discussion of speed-gradinglbinning, the definition of a data-path

can be simplified to be a passage from a source flop to the destination flop through a

collection of combinational gates. Within a clock domain, upon a clock edge, all flops

simultaneously sample data fed into them by the data-path attached to their inputs, and

launch that same data to the paths connected to their outputs. This means that data

sent out from the source flop on a clock edge must arrivekettle at the destination flop

before the next clock edge, or a setup timing violation has occurred. Since data-paths

vary in length, the longest path, called the critical path, determines the gap between two

consecutive clock edges and therefore the operational frequency of the device.

Typically, speed-gradinglbinning vectors are designed to exercise the critical paths in the

design through consecutive clock pulses. By shrinking the duration between the clock

pulses, the exercised paths will have less time to complete their data propagation from

the source flop to the destination flop. The minimum gap between the clock pulses, or

the minimum clock period, such that data can still be transferred across the targeted

data-paths marks the speed or grade of the device. Therefore, for speed-grading, the

same set of test vectors are run on the DUT multiple times, each time with an

incrementally shorter clock period until mismatches occur between the DUT's actual

response and expected outcome. Then the DUT is said to be graded for the most recent

passing clock periodlfrequency. A speed-binning operation is similar to speed-grading,

except with only a few steps of clock period increments. During speed-binning, critical

paths of a DUT is subjected to testing with the same vectors multiple times, each at a

different prescribed clock speed. Each clock speed defines a class of the same device

capable of operating at a specified operation frequency. Therefore, the DUT passing

one speed specification or "bin" but failing the next is placed into the passing bin.

In order to create speed-gradinglbinning test vectors efficiently and automatically,

scan chains must be utilized to convert all flops into control and observe points. This

way, ATPG tool can be used to generate the test vectors automatically. In order to do

that, a new fault model is required: Path-delay fault model.

Path-delay fault model models an entire data path with two faults: A slow-to-rise

fault that states a 040-1 transition on the source flop's output pin must invoke a transition

on the input of the destination flop within a clock period. A slow-to-fall fault that states a

1 -to-0 transition on the source flop's output pin must invoke a transition on the input of

the destination flop within a clock period. [ I 11

Today's ATPG tool is designed to support the path-delay model, and creates two

consecutive clock pulses at fast clock frequency, a procedure called double-pulse. This

process is similar to transition-delay ATPG. Using scan-shifting, it first loadslpresets the

source flop's output with the pre-transition value and its data-input with the value to be

transitioned, then with SE de-asserted, launches the transition value with the first clock

pulse of the double-pulse, and tries to capture the transition value with the second pulse

on the destination flop. Then content of the destination flop (which is a scan cell inside a

scan chain) is shifted out for comparison. Please note that the model itself does not

specify the clock speed, but rather just specifies the values before and after the

transition. As long as a transition on the data-path of interest (most likely a critical path)

is properly launched and captured through the double-pulse of the clock, the path is

considered to be sensitised at-speed. This way, it is up to the designer to choose a

clock frequency of the double-pulse and the tool to implement it. The following diagram

provides an example of path-delay ATPG and the method of detecting at-speed faults

using double-pulse of the clock.

Figure 6: AC-Scan path-delay ATPG example

flop A (value on Q): after sh~ft: 1

flop D (value on Q):

after launch pulse: don't care after shift: 1

after capture pulse: don't a r e alter launch pulse: 1 after capture pulse: don't care

scan In port I=>

0 to 1 transition

after capture pulse: 1 flop B (value on Q): after shift 1

flop C (value on Q):

after launch pulse: don't care after shift: 0

after capture pulse: don't care after launch pulse: 1 aftw capture pulse: don't care

The diagram above shows five scan flops linked together into one single scan

chain. SE pin of all flops are connected together, as with CP pins. The targeted fault is

the slow-to-rise fault on the data-path from flop C to flop E through the OR gate and the

AND gate. To test for this fault, a 0+1 transition on the Q of flop C needs to be created

and launched through both OR-AND gates through double-pulse of the clock, and the

post-transition value ('1 ') needs to be captured into flop E after the end of the double-

pulse. Therefore, here are the values on the flops after shift-in:

A '0' is stored in flop C through scan shifting to serve as the pre-transition value.

A '1' is stored in flop B also through scan shifting, and therefore appears at data-

input of flop C, to serve as the post-transition value. When SE drops to 'O', this '1' is

launched across flop C during the first pulse of the double-pulse, then propagates

through the OR and AND to be captured onto flop E on the second pulse of the

double-pulse.

A '1' is stored through scan shifting in flop D to hold the AND gate transparent.

A '1' is stored in flop A through scan shifting, and therefore appears at data-input of

flop D, to be launch across flop D during the double-pulse. This maintains the

transparent state of the AND gate so that data can propagate through it from flop C.

4.2 Path-Delay ATPG Methodology for Input/Output Characterization

In order for a piece of IC to fit into a board-level system, it must respect the input

and output timing of its neighbouring ICs to properly communicate with them. For

example, data-out bus of a device is usually specified to send out new bits close to the

rising edge of the clock-output signal. The gap between clock edge and the data-edge is

usually part of the timing specification and is strictly enforced by the design of the

device. With this timing behaviour clearly defined, the downstream module receiving

data from the device can be designed to sample its corresponding inputs accordingly

based on clock arrival. In order to verify that all I t0 timing specifications have been

satisfied, test vectors need to be created to sensitize the data-paths on the boundary of

the device.

I10 characterization vector creation on a device with scan chains properly

constructed can be accomplished by extending the capability of AC-Scan path-delay

ATPG. With minor modifications to the ATPG tool's invocation and some post-

processing of the pattern files generated, one can characterize device-level 110s by

feeding to the ATPG tool data-paths from inputs to flops andlor from flops to outputs.

When patterns are created to properly sensitize these boundary paths, Test Engineer

can move the data edges of the 110s to properly characterize their timing.

The following shows the exact operation of the various types of I10

characterization. The actual pattern generation will follow the waveforms described

here. In general, the patterns follow the sequence of multi-cycle shift-in operations, then

a launch cycle followed by a capture cycle to drivelsample the 110s and the associated

flops, then finally a multi-cycle shift-out operation.

4.2.1 Input setuplhold characterization

Timing of data transmitted into a device's input ports is usually specified

as time between data transition before the clock edge (setup time) and after the

clock edge (hold time). Patterns need to be generated to verify that the device

can properly sample incoming data on the input ports when they are sent in

according to the setup and hold timing specifications.

Figure 7: AC-Scan input setuplhold characterization

functional Input to be [

char'ed

DEVICE

scan-out

.c Assume all flops are clocked by sys-clk in the waveform below.

Upon rising edge of sys-clk, assume each flop is driven by D input when SE=O; otherwise, take in Q of previous flop in the scan chain.

dont-cares 1 \ dont-cares

functional m u t

scan-in tester cycle

shift in - . launch. -capture _-

s t i r h i \ capture o dont-cares dont-cares

4

SYS-CI~ J n l nI n l I

The diagram above shows the operations of input characterization. During input

characterization, the shift-in operation usually sets up the data-path from the functional

input to the destination flop such that this path can be sensitized properly when SE

drops to '0'. Then SE is set to '0' at the beginning of the (empty) launch cycle. Then in

the capture cycle, with the data-path ready to be sensitized, a 01)11)0 or 1 1)01)1 data

sequence is applied to the functional input such that the triggering clock edge is situated

in the middle value of the sequence. The middle logic value is captured into the

destination flop by the clock pulse. Finally, the captured value is shifted out for

comparison. The test pattern is generated with the data edges far away from the clock

edges to guarantee that the pattern passes. Then test engineer will rerun the test

pattern by moving the data edges closer to the clock edge until a failure occurs. The

shift out

location at which the failure occurs provides the setup and hold time achieved by the

device. This can then be compared with the timing specification intended by the design.

4.2.2 Output data propagation characterization

Timing of data transmitted out of a device's output ports is usually specified as

the maximum allowed time delay of the data-out value compared to a clock signal

(output propagation time). In cases where an entire data bus is transmitting, there is

usually also a timing specification on the maximum skew among every index of the bus,

i.e. all output transition must occur relatively close to each other within a time window.

To verify either timing specification, patterns need to be generated to sample the output

ports after a clock edge. Then, the longest delay subtracting the shortest delay provides

the skew measurement.

The diagram below shows the operations of output data transition

characterization, where the data path from the flop to the output port through CLI is to

be timed. During this characterization, the shift-in operation loads the data-out flop with

the value before the transition (in case of 0-to-1 transition, '0'). Also, the values loaded

into the scan chains (not shown in diagram) provide the post-transition value onto the D

pin of the data-out flop. This way, during the launch cycle, when SE drops to '0' and the

clock is pulsed, the value on the D input is launched across the flop to start the

propagation. A strobe point is placed in the capture cycle to sample the post-transition

value on the output port. A test engineer can move the strobe point forward in time to

find the time of failure. The time of failure is the measured output timing on the port and

can be compared with the intended timing specification.

Figure 8: AC-Scan output data-propagataion characterization

0t01 transition dont-cares

functional / output strobe output I

scan-out

Assume all flops are clocked by sys-clk in the o $ ~ : ~ ~ e waveform below.

char'ed Upon rising edge of sys-clk, assume each flop is driven by D input when SE=O; otherwise, take in Q of previous flop in the scan chain.


shift in launch. .capture -- shift out . . sys-c~k j j n / n l n / n I I n j n /

sample \ ! too transtion dont-cares dont-cares

I

I I I I I I 1

Throughout both the launch and capture cycles, the patterns generated by the

ATPG tool need to hold the active-low output-enable at '0' to ensure that the values on

data-out can propagate out without obstruction. The placement of this type of

sensitization behaviour can be done in the ATPG tool through provision of ATPG

constraints within the tool.

I I

4.2.3 Output tristate characterization: 1 C + Z

Often for tri-stateable output ports or bi-directional ports, there are timing

specification on the time needed to enableldisable the output. In this characterization

exercise, transition between tristate and '1' of the port can be timed.

Figure 9: Output tristate characterization (1 and Z)

scan-out

Assume all flops are clocked by sys-clk in the o$$yLe waveform below.

char'ed Upon rising edge of sys-clk, assume each flop is driven by D input when SE=O; otherwise, take in Q of previous flop in the scan chain.


shift in - launch. -capture -- shift out . S Y S - C I ~ J n l n

ztol trans~tion dont-cares

functional j outpu; strobe ,

output sample '\, I

I ~ O Z transltlon dont-cares dont-cares I

The above diagram shows the operations of characterization tristate-to-one and

one-to-tristate timing on an output port, where the data path from the flop to the output

port through CL2 is to be timed. During this characterization, the shift-in operation loads

the output-enable flop with the value before the transition ('1' for case of Z-to-1

transition, '0' for case of 1 -to-Z transition). Also, the values loaded into the scan chains

(not shown in diagram) provide the post-transition value onto the D pin of the output-

enable flop. This way, during the launch cycle, when SE drops to '0' and the clock is

pulsed, the value on the D input is launched across the flop to start the propagation. A

strobe point is placed in the capture cycle to sample the post-transition value on the

output port. Test engineer can move the strobe point forward in time (possibly into the

launch cycle) to find the time of failure. The time of failure is the characterized output

timing on the port.

Throughout both the launch and capture cycles, the patterns generated by The

ATPG tool need to hold the data-out port at ' I ' to ensure that the device-level port

registers a '1' when output is enabled. The placement of this type of sensitization

behaviour can be done in the ATPG tool through provision of ATPG constraints. In fact,

due to the tool's inability to directly handle data-paths through output-enable of tristate

buffers, pattern is generated at block-level with O t + 1 transitions on the block-level

output-enable port. Then the pattern is post-processed to convert the 0 t + 1 transitions

into 1 t +Z transitions.

4.2.4 Output tristate characterization: OC+Z

Figure 10: AC-Scan output tristate characterization (0 and Z)

scan-out

+ DEVICE ( 1 block I

funaional Output to be

char'ed

Assume all flops are clocked by sys-clk in the waveform below.

Upon rising edge of sys-clk, assume each flop is driven by D input when SE=O; otherwise, take in Q of previous flop in the scan chain.

I scan-in

tester cycle

shift in shift out . S Y S - C ~ ~ J n l n l n

I

ZtoO transition I I dont-cares

functional /I f output I sample \ I

O ~ O Z transman dont-cares dont-cares I I

Similar to the above section, this operation is to time transition between tristate

and '0' on an output or bi-directional port. The above diagram shows the operations of

characterization tristate-to-zero and zero-to-tristate timing on an output port, where the

data path from the flop to the output port through CL2 is to be timed.

During this characterization, the shift-in operation loads the output-enable flop with

the value before the transition ( ' I ' for case of Z-to-0 transition, '0' for case of 0-to-Z

transition). Also, the values loaded into the scan chains (not shown in diagram) provide

the post-transition value onto the D pin of the output-enable flop ('0' for case of Z-to-0

transition, ' I ' for case of 0-to-Z transition). This way, during the launch cycle, when

scan-en (SE) drops to '0' and the clock is pulsed, the value on the D input is launched

across the flop to start the propagation. A strobe point is placed in the capture cycle to

sample the post-transition value on the output port. A test engineer can move the strobe

point forward in time (even into the launch cycle) to find the time of failure. The time of

failure is the characterized output timing on the port.

Throughout both the launch and capture cycles, the patterns generated by the

ATPG tool need to hold the data-out port at '0' to ensure that the device-level port

registers a '0' when output is enabled. In fact, due to the tool's inability to directly handle

data-paths through output-enable of tristate buffers, pattern is generated at block-level

with O t + l transitions on the block-level output-enable port. Then the pattern is post-

processed to convert the O t +l transitions into O t + Z transitions.

4.3 Path-Delay ATPG Flow Description

The diagram below is a flow chart of AC-Scan ATPG flow for Path-Delay model.

It is broken down into 3 stages each of which is described more closely in the

paragraphs following. Please note that for If0 characterization, since there are four

types of characterizations as described in the last section, each type of characterization

requires a separate invocation to the flow.

Figure 11 : AC-Scan path-delay ATPG flow-chart

------- running \

~ d o ~ a c s ~ a n ~ ~ a t h d e l a y ~ v e c . p r l \ , I

Generate all files to pmxxxx-acscan-pathdly.data automate flow

I

pt-paths-report: PrimeTime 1

Map func. clock pmxxxx-targeted-paths*.txt: report containing paths to be - domains to scan path definition files to Fastscan. sensitized domain groups (translated from pt-paths-report) -

I

FastScan CPA --

pmxxxx iochart.wgI: AC pathdely WGL pattern, to be sim'ed.

I I i

I 3 I I Pattern post-processing pmxxxx .iocharpmapready.wgl:

with VTRA N and I10 char. WGL pattern ready for process-aciochar wql.prl mappmg. -

I done

4.3.1 Generate all files to automate flow

This is stage one of the flow. Since the flow is automated and generic across all

designs, user only needs to enter information needed by the flow in a design-info file.

This file is fed into a script that generates all files necessary to accomplish downstream

tasks. There are several advantages to this type of design methodology:

Automated design flow shortens development time and increases efficiency, allowing

designers to concentrate on tasks specific to the design.

A single data file at the start of the flow creates a clear point of intrusion, making

debugging easier.

Running through ATPG requires expertise in the area of DFT. The flow is designed

to provide a higher level of abstraction to allow designers without relevant knowledge

and/or experience to still accomplish their jobs. In essence, automation flows are

designed to shield designers from the intricate details of the tools contained within.

This creates more clear-cut knowledge boundaries and better technical organization.

Included as part of the file generation is a list of path files each containing the

critical paths within a clock domain. There are as many of these files as the number of

clock domains.

4.3.2 Performing ATPG in AC path-delay mode

This is stage two of the flow. In this stage, ATPG is run for each group of critical

paths. If run successfully, each ATPG will result in the generation of a WGL

pattern/vector file containing AC-scan path-delay patterns targeting a collection of the

paths for a specific clock domain. These WGL patterns are to be translated into

simulation testbenches and simulated for verification. For speed-gradinglbinning ATPG,

the flow ends here. For I10 characterization, since the vectors are generated at block-

level, the next step prepares for vector mapping to device-level.

4.3.3 Pattern post-processing

As mentioned, all characterization patterns are generated at block-level to avoid

tool limitation in handling tristate logic. Therefore patterns need to be mapped to device-

level to be useable on production testers. For example, for tristate-to-one check on a

particular pin, the actual tristate logic is outside the block-level circuit. Therefore The

ATPG tool is fed a path from the output flop to the block-level output pin leading to the

output-enable of the tristate logiclpad. Consequently, the block-level pattern generated

by The ATPG tool only records '1' and '0' on the output-enable pin. In order to properly

map the pattern to device-level, the pattern post-processing step creates a new pin that

translates '1' and '0' on the output-enable pin into 'Z' and '1' respectively.

Other than pattern manipulation for tristate checks, other types of

characterization may also require post-processing. This usually is related to scan-mode

pin sharing. All scan-inlout pins of the device are shared with functional 110s at the

boundary of the device, i.e. outside the block. This means that for a device I10 reused

for scan-inlout, the logic to realize the sharing of functionality (between scan and normal

modes) is not present at block-level. i.e. the block would contain two pins which will be

merged outside. Therefore in this situation, when the ATPG tool generates patterns on

the block, it will place patterns onto both pins. Care has been taken to ensure that the

patterns in both pins (stimuli or strobes) do not conflict with each other. The post-

processing step would merge values on the two pins into a new pin.

4.4 AC Path-delay ATPG Issues

This section discusses some of the issueslpeculiarities related to AC path-delay

ATPG. More specifically, emphasis will be placed in deciding on acceptable test

coverage, choosing a good clock source for I10 characterization, and path-selection.

4.4.1 Acceptable test coverage

For speed-gradinglbinning, while having the mindset of trying to achieve as high

a test coverage as possible on the critical paths, there is no absolute requirement for the

test coverage. If a number must be given, experience has shown that anywhere

between 40-60% should be reasonably achievable. Since this test is not meant to detect

manufacturing defects on the die, it does not require extremely high coverage to find

gross performance outliers. Instead, it is trying to detect the effect of fabrication process

variations on the DUT. Process variations, such as small shifts of doping level from

wafer to wafer, provide gradual changes in circuit performance between neighbouring

dies on a wafer or between dies on neighbouring wafers. [ I 21 They affect entire dies,

contributing to a slight overall performance enhancement or degradation of the dies.

Therefore, testing a few representative paths may be adequate to gauge the overall

performance of the device. The important point here is to ensure that the critical paths

are targeted for ATPG. This may be somewhat tricky as, due to process variation, the

critical paths reported from static-timing analysis of the design during chip development

may become faster paths in silicon, making the other presumably shorterlfaster paths

the speed-determining paths. Therefore, it is necessary to choose at minimum a

collection of paths that represents the top 10% of the total paths in terms of length. Path

selection is discussed further in a later section.

On the other hand, 100% test coverage is required for I10 characterization. In

these types of tests, each path chosen to be targeted for ATPG represents a part of a

timing specification. Because every timing specification is required to be verified, every

path must be properly sensitized, meaning 100% test coverage is a hardened

requirement. If this is not achievable, functional vectors need to augment the AC-Scan

I10 characterization vectors.

4.4.2 Clock source for I10 characterization

It is common for an input or output timing specification to be based on input clock

edges. This characteristic provides an interesting challenge for scan-mode testing.

More specifically, a typical functional-mode device-level clock pin toggles all flops under

its control through a delay-lock-loop circuit (DLL). A DLL compensates for the delay

introduced by the clock tree by adding the correct amount of extra delay to the clock line

such that the clock signal at the device-level input clock and the signal at the clock pins

of the associated flops are exactly 360-degrees out of phase. [I 31 In other words, the

clock signal at the source and leaves of the clock tree are synchronized, but off by

exactly one full clock cycle.

During scan, the DLL stops working because its flops operate as elements in the

scan chain. However, for I10 characterization, the DLL needs to be operational and

locked at the right functional clock frequency. This causes a logical conflict as AC-Scan

path-delay ATPG for I10 characterization requires that the scan chain structure be

maintained, and that the DLL be in functional state. The workaround to this issue is to

scan the DLL separately from the rest of the digital logic such that the device contains

two scan modes, one for the DLL and one for everything else. This way, path-delay I10

characterization ATPG would have access to the necessary scan chains and the

functionalities of the DLL.

4.4.3 Path Selection Issues Related to Speed-GradingIBinning

Data paths selected for path-delay ATPG is usually chosen for their propagation

time. The propagation time is reported from static timing analysis (STA) of the design by

using commercial STA software such as PrimeTime from Synopsys Inc. Since STA

results are obtained from timing models of gates and calculated timing delays of wires,

actual timing of the selected path in silicon may be different from the STA report. This

presents a challenge for speed-gradinglbinning as a device that passes path-delay test

at a particular clock frequency may in fact be faster or slower than the tested speed.

Furthermore, actual critical paths in silicon may not even be reported as critical paths by

STA prior to chip tape-out. Consequently, a way to make better use of path-delay ATPG

is to find correlation between STA data, tested device speed, and actual device speed.

Also, some techniques are needed to select the right collection of paths.

4.4.3.1 Some path-selection techniques

Since speed-gradinglbinning essentially checks for the effect of process variation

on the IC's performance, it is important to choose a set of data paths that are evenly

distributed across the die of the chip. [ I 41 Also on the same note of testing process

variation, Dr. Karim Arabi introduced the following concept during a verbal discussion of

this issue. It is described as follow: All data-paths within a device can be classified

according to their timing, forming a timing distribution. One can choose a group of timing

paths within each section of the timing curve and test each group according to their

maximum allowable clock frequency. The relative numbers of paths within all groups

form a statistically representative sample of the full timing distribution. All targeted paths

must pass path-delay test at their respective speed to qualify the device. This approach

is better than only selecting the critical paths because it qualitatively verifies the

correlation between tested speed and STA data.

4.4.3.2 Correlating path-delay results with actual device speed

A paper written by Bruce D. Cory, Rohit Kapur and Bill Underwood titled "Speed

Binning with Path Delay Test in 150-nm Technology" introduced an interesting idea for

correlating path-delay results with actual device. It may be possible to apply its concept

here to increase the usefulness of path-delay ATPG in the context of this project. Here

is a brief description of the correlation process.

After tape-out of any design, the product will progress through a prototyping

phase before production release. Prior to production release, the fabricated IC will be

thoroughly tested to verify its behavioural functionality. This is not done by running scan

vector on ATEs, but by plugging the device onto a system board that is designed in

parallel to the design of the device itself. The board is meant to validate all

functionalities of the device at the required clock speed to flush out all functional bugs.

(Hopefully, there are no critical bugs discovered at this stage that warrants revisions of

the design). This type of functional tests alone is actually sufficient to speed-gradelbin

the chip. However, the test is usually too time-consuming to be economically performed

on every device. This is the reason for path-delay ATPG. Since the functional

prototyping tests can qualify the device at a maximum clock frequency (referred to as

Fmax), and path-delay test can be performed on the same device, the failed clock

frequency obtained from the path-delay test can be correlated with that of the functional

test. [I 41 From this correlation, future devices produced after production release can

simply be tested through path-delay scan vectors and qualified at a particular bin

according the correlation relationship. The test-case shown in the aforementioned paper

provided very clear and linear correlation between path-delay results and functional test

results.

CHAPTER 5 AT-SPEED ATPG ON RAM INTERFACE

5.1 Overview of RAM Interface

Since RAM is a complex sequential module, testing its interfaces requires extra

ATPG effort than the normal AC-Scan transition-delay ATPG. More specifically, it

invokes the multi-load ATPG algorithm where a particular fault is tested through multiple

shift loads of scan data. [I 51 For reasons of toollpattern efficiency, this algorithm should

be invoked separately on logic requiring its presence only (namely RAMS).

Consequently, a flow for testing the RAM interfaces is developed as a standalone

component, separated from the normal (or single-load) AC-scan ATPG flow. Since RAM

testing with scan chains is relatively new to PMC-Sierra, much of the effort spend here

involves understanding behaviour of the ATPG tool during the operation. The following

sections will describe, among other topics, the author's understanding of how the ATPG

tool targets and tests the different RAM interfaces with its built-in ATPG algorithms. This

understanding was obtained through generating scan patterns on the interfaces of a

test-case, simulating the patterns to ensure its integrity, and painstakingly tracing logic

states of the RAM interfaces to interpret the purpose of all state changes on relevant

nodes of the design asserted by the ATPG tool. This was a necessary and important

step in gaining confidence in the tool's ability in order to develop an automation flow

around it and release the flow to the design community of PMC-Sierra.

Figure 12: Simplified diagram of RAM interface

scan-out

- Assume all flops are clk pin of RAM are toggled by the same clock source

MEB - There may also be logic between peripheries of the RAM and the flops attached

OEB to them. They are omitted here for simplicity.

1 CLK

The above figure shows a simplified view of the RAM interface. Logic excluded

from the diagram is mainly Built-In Self-Test (BIST) bypass logic between the

peripheries of the RAM and the attached flops. The scan chain, shown as the arrowed

line from scan-in to scan-out through all flops, does not dictate the order of the flops in

the chain, but merely depicts that the sequential elements are scan-inserted and serve

as proper control and observe points during ATPG. At this time, it is worthwhile to briefly

describe the following pins of the RAM:

5.1.1 MEB: Active-Low Module Enable

This is the active-low module-select pin. Its functional logic needs to be gated

with SE such that the RAM module is disabled during scan-shift operation. This is

because of the multi-load nature of the RAM patterns. Much of the testing is done

through writing data into certain RAM locations from data shifted in on one scan load,

and reading from those locations at the end of the next scan load. In order to preserve

the content of the write operations through scan shifting, RAM needs to be disabled.

5.1.2 OEB: Active-low Output Enable

This is the active-low output-enable pin, and is present in some RAM modules.

As part of the PMC standard, OEB is tied to '0' such that output is always enabled.

5.2 At-Speed ATPG Methodology for RAM Interface

There are three main interfaces to be tested: DIN, ADDR, and DOUT. Testing

each interface requires a different ATPG approach, as implied by the ATPG tool. Please

note that the ATPG methodology described here uses only transition-delay fault model

because path-delay ATPG on RAM interfaces is not yet available at the time of this

project. However, the impact of this deficiency is minimal to the higher-speed RAMS as

the RAM interfaces are usually directly registered with minimal combinational gates in

between. This makes the path-delay coverage implied if the interfaces are well tested in

transition-delay mode.

5.2.1 Testing DIN of RAM

The diagram below depicts the strategy for testing DIN of RAM by the ATPG tool.

This test involves two scan loads, one to write complementary bits to two different

locations (marked as IocationA and IocationB in the above figure) in the RAM, and the

other to read the content from those locations. Here is the description in detail:

Figure 13: Testing DIN of RAM

scan-out


- There may also be logic between peripheries of the RAM and the flops attached to them. They are omitted here for simplicity.

> CLK

tester

sh~f l sh~f l launch caplure shdt . C I O C ~ . . . n ... n ! rin I n

I I (3) (4)

s c a n - e n -I

First apply a scan load (shift) that provides the following setup: Q pins of the ADDR

flops would select IocationA once latched in; D pins of the ADDR flops would select

location6 once it is clocked to the Q pins and latched in; Q pins of the DIN flops

would drive data (call it dataA) into IocationA upon a clock pulse; D pins of the DIN

flop would contain bits that are complementary to dataA (call the D pin values data6)

and should be driven into location6 after two consecutive clock pulses; Active-low

write-enable (WEB) is setup to be '0' for the next two clock pulses to allow for two

consecutive write operations.

As marked in the diagram as (A), scan-en drops to '0' and a clock pulse is issued to

write dataA into IocationA.

3. As marked in the diagram as (2), a second clock pulse is issued to write dataB into

IocationB in at-speed fashion. At this stage, the DIN input bus of the RAM has been

sensitized at-speed due to the double pulse from (1) and (2). If there is an at-speed

defect on a bit of the bus, that bit in IocationB would have falsely received the

corresponding bit in IocationA. The rest of the sequence reads out the content of

IocationB to verify the write operaion.

4. Apply another scan load (shift) that provides the following setup: Q pins of the

ADDR flops would select IocationB; WEB is setup to be '1' to allow for a read

operation.

5. As marked in the diagram as (3), scan-en drops to '0' and a clock pulse is issued to

read the content of locationB (dataB) to the DOUT bus of the RAM.

6. As marked in the diagram as (4), another clock pulse is issued to capture the values

on DOUT onto the DOUT flops. Then the shift-out operation pipes the content of the

DOUT flop to the scan-out port for sampling. This verifies whether the at-speed

write operation has been done fault-free, and completes the test.

5.2.2 Testing ADDR of RAM

The diagram below depicts the strategy for testing ADDR of RAM by the ATPG

tool. This test involves two scan loads, one to write complementary bits to two different

locations (marked as IocationA and IocationB in the above figure) in the RAM, and the

other to read the content from those locations. Here is the description in detail:

Figure 14: Testing ADDR of RAM

scan-out


- There may also be logic between peripheries of the RAM and the flops attached to them. They are om~tted here for simplicity.

tester

1. First apply a scan load (shift) that provides the following setup: Q pins of the ADDR

flops would select IocationA; D pins of the ADDR flops would select IocationB once it

is clocked to the Q pins; Q pins of the DIN flops would drive data (call it dataA) into

IocationA upon a clock pulse; D pins of the DIN flop would contain bits that are

complementary to dataA (call the D pin values dataB) and should be driven into

location6 after two consecutive clock pulses; WEB is setup to be '0' for the next two

clock pulses to allow for two consecutive write operations. Please note that the

actual addresses signifying IocationA and location6 are only different by a single bit.

The bit that is different is the target of the at-speed test.

cycle shlft launch capture '1 sh~ft launch capture sh~ll

2. As marked in the diagram as ( I ) , scan-en drops to '0' and a clock pulse is issued to


4

clock J-q . . . J ; n I 1 i n q... n n ! ri n 1 - 1 \

I (1) (2) 1 (3) (4) ' scan-en I - r

3. As marked in the diagram as (2), a second clock pulse is issued to write data6 into

IocationB in at-speed fashion. At this stage, the changing bit in the ADDR bus of the

RAM has been sensitized at-speed from the double-pulse generated by (1) and (2).

If there is an at-speed defect on that bit, dataB would be incorrectly written into

IocationA instead of IocationB. The rest of the sequence reads out the content of

IocationA to verify that its content is dataA, not dataB.


ADDR flops would select IocationA; WEB is setup to be '1' to allow for a read

operation.


read the content of IocationA to the DOUT bus of the RAM.


on DOUT onto the DOUT flops. Then the shift-out operation pipes the content of the

DOUT flop to the scan-out port for sampling. This completes the test.

5.2.3 Testing DOUT of RAM

The above diagram depicts the strategy for testing DOUT of RAM by the ATPG

tool. This test involves four scan loads, all of which involve read and write access to two

locations different RAM locations (marked as IocationA and IocationB in the figure

above). Here is the description in detail:

Figure 15: Testing DOUT of RAM

scan.

scan-out


- There may also be logic between peripheries of the RAM and the flops attached to them. They are omitted here for simplicity.

1. First apply a scan load (shift) that provides the following setup: Q pins of the ADDR

flops would select IocationA; Q pins of the DIN flops would drive data (call it dataA)

into IocationA upon a clock pulse; WEB is setup to be '0' to allow for a write

operation.

2. As marked in the diagram as ( I ) , scan-en drops to '0' and a clock pulse is issued to



ADDR flops would select IocationB; Q pins of the DIN flops would drive data (call it

dataB) into IocationB upon a clock pulse; WEB is setup to be '0' to allow for a write

operation.


write dataB into IocationB. Most, if not all, of the bits in dataB is complementary to

dataA.


ADDR flops would select IocationA; WEB is setup to be 'I' to allow for a read

operation.


read content of IocationA (dataA) to the DOUT ports of the RAM. This operation

presets the value of the DOUT. Since OEB is always enabled and ME6 is disabled

during shift, the DOUT value will be maintained through the next shift load


ADDR flops would select IocationB; WEB is setup to be ' I ' to allow for a read

operation.


read the content of IocationB (dataB) to the DOUT bus of the RAM.


on DOUT onto the DOUT flops in at-speed fashion. If there are no delay defects on

the DOUT pins, this clock pulse would have captured the content of dataB onto the

DOUT flops. Because many, if not all, of the bits in dataA and data5 complement

each other, the pulses in (4) and (5) together have launched a set of transitions on

the DOUT pins and captured the post-transition value onto the associated flops.

Then the shift-out operation pipes the content of the DOUT flop to the scan-out port

for sampling. This completes the test.

5.3 RAM ATPG Issue(s)

Currently, due to the ATPG tool's inability to enforce double-pulses for sensitizing

all interfaces of RAM-namely, the DOUT interface, the generated patterns may be

problematic if a PLL is used to toggle the RAM clock(s) during capture. PLL is currently

designed to issue two at-speed pulses in scan mode upon a falling edge of scan-en.

This conflicts with the setup requirement of the DOUT test which issues single-pulse

waveforms. Therefore, three workarounds are proposed here:

1. If the I10 rate of the scan-mode clock port is fast enough, disable the PLL and use

the top-level clock source directly to generate the pulses in capture mode.

2. Since PMC's PLL setup can be done solely through JTAG operations, pattern file

can be post-processed to insert JTAG sequences to bypass the PLL for the single-

pulse captures. In this case, it is important that none of the JTAG pins (tck, tdi, tdo,

tms, trstb) are shared for scan purposes. Sharing JTAG pins for other means is not

only a problem for this work-around, but also a direct violation of the JTAG

standards.

3. Reuse a top-level functional pin to control the PLL bypass mux in scan mode. This

way, the mux can bypass the PLL for the single pulses with a simple pin constraint

on the mux-control port.

CHAPTER 6 TEST-CASE RESULTS

As part of the verification for the ATPG automation flows as well as the ongoing

IC development within the company, various aspects of the flow have been subjected to

thorough testing. This chapter presents some of the results from the transition-delay

ATPG flow and the path-delay ATPG flow. Unfortunately, at this time, meaningful data

have not been collected on ATPG of RAM interface due to delay in delivery of a generic

RAM wrapper that, among other features, contains the gating logic to enable RAM

ATPG. Consequently, the flow was only subjected to a simple test-case as a proof of

concept. ATPG was successful on that test-case.

Table 1: Test-case results for AC-scan Transition-delay ATPG

gate count C clock domain count

No. of scan chains

longest chain

AC attern count L inal AC coverage

DC fault-grade coverage

DC top-up pattern count

final DC coverage

otal pattern count

Table 1 records pattern and coverage results for five design blocks subjected to

the transition-delay ATPG flow. AC pattern count and final AC coverage describe the

effectiveness and efficiency of the initial ATPG run in AC mode. The coverage ranges

between 66% and 8O0/0, and is inline with the expected coverage number of a testable

design. Please also note that, in some cases, patterns were truncated to save tester

memory. The DC fault-grade coverage describes the DC-mode coverage of the patterns

generated for AC. The DC faults not covered by the AC patterns were then targeted for

the DC-mode top-up ATPG to obtain the final DC coverage and the total pattern count.

The last row of the table records the vector count. Vector count is approximated by

multiplying the pattern count with the length (or numbers of flops) of the longest chain in

the design and, as mentioned earlier, directly impacts the availability of ATE memory

space. From this relationship, one can understand that keeping the scan chains short is

another effective way of lowering vector count. As an example, block 2 in the above

table has only half of the pattern count of block 1. However, its vector count exceeds

that of block 1 because its chain length is close to three times of the block 1's chain

length.

Figure 16 records the incremental coverage increases as AC-scan transition-

delay patterns are generated for four of the five blocks mentioned in Table 1. A notable

observation here is that some of the curves in the graph show discontinuity during the

incremental increases. These discontinuous points marks the limit of achievable

coverage for one clock domain and the targeting of another clock domain as the ATPG

tool gives up on the first one. For example, on the curve for block 3, the ATPG tool was

able to achieve as much as 42% AC coverage by targeting clock domain one. As the

number of testable faults depletes in that domain, the tool moved on to target the second

domain. Block 2 exhibits the same behaviour at approximately 73%.

Figure 16: AC Transition-delay coverage VS. pattern count

AC Transdly Coverage Incremental Increase

0 500 1000 1500 2000 2500 3000 3500

Pattern Count

Table 2 shows the test-case results for path-delay ATPG for speed-grading.

Coverage is usually between 50% and 65%. Block 3 is considered as an outlier due to

the small number of paths targeted for ATPG. Path-delay ATPG for I f0 timing

characterization has also been done on one testcase and 10O0/0 coverage was obtained

from that.

Table 2: Test-case results for AC-scan Path-delay ATPG

CHAPTER 7 CONCLUSION

With the growing complexity of today's integrated circuit designs, engineers have

abandoned the use of pure functional test vectors wherever possible, and adopted

various DFT solutions to make their designs more test-friendly. The most common DFT

approach for digital designs is scan insertion, because of its relative simplicity in

conceptual understanding and implementation automation. During scan insertion, flip-

flops in the design are converted into scan flops and linked into chains of shift registers

called scan chains. This way, data input stimuli are shifted into every flop through their

respective scan chains. Then, after the stimuli are given the time to propagate through

their functional data paths, the flops are triggered to capture the result of the

propagation. Finally, the results are marched out of the chains for strobing. The

structured nature of scan-testing paves the way to various automated test-pattern

generation tools. These tools are capable of automatically generating test vectors on a

scan-inserted design of significant gate-count based on a stuck-at fault model, usually

achieving 95% or higher test coverage within hours of processing. This, compared to

the multiples of man-weekslmonths of manual vector creation, creates great savings in

engineering and corporate resource.

As transistor size continues to shrink, new defect mechanisms start to appear

that can no longer be properly modelled by the stuck-at fault model. These new types of

defects adversely affect the speed of the IC's operation. More specifically, a defect may

no longer cause a node be permanently "stuck" at a particular logic level when a

transition is required, but cause the desired transition to take place at a much slower

speed than needed. This class of new defect mechanisms prompted the creation of the

transition-delay fault model. ATPG tools supporting this model create patterns that

launch two at-speed clock pulses in between the shift-inlout operations. The two pulses

are accompanied with test vectors that launch and capture data transitions on the

targeted node, hence testing the node in at-speed fashion. As part of this project, an

automated ATPG flow is created to generate test vectors for at-speed defect detection.

The flow involves finding the functional clock domains and their respective clock speed,

identifying the scan-mode clock sources for each functional domain, performing ATPG

on each domain in AC-scan mode to obtain transition-delay test coverage, fault-grading

the AC vectors in DC mode, and finally creating DC-mode top-up vectors. This test

approach provides good AC and DC test coverage with the least amount of vectors

possible.

Along with transition-delay fault model, another model called path-delay fault

model is also created. This fault model models an entire data path with two faults: A

slow-to-rise fault that states a 0-to-1 transition on the source flop's output pin must

invoke a transition on the input of the destination flop within a clock period; A slow-to-fall

fault that states a 1 -to-0 transition on the source flop's output pin must invoke a transition

on the input of the destination flop within a clock period. This model is ideal for speed-

gradinglbinning and can also be applied quite effortlessly to I10 characterization on a

scan-inserted design. This is because the model focuses on the transition of data-paths

rather than a particular node of a particular gate within a data-path. The basic idea of

speed-gradinglbinning is to sensitize the critical paths of the design at the highest speed

possible; similarly, I10 characterization is done through sensitisation of timing-critical

data-paths connected to the design's primary I10 ports. Therefore, both speed-

gradinglbinning and I10 characterization can be done by feeding the ATPG tool with the

paths of interest. An automated path-delay ATPG flow is created based on this

approach.

For designs containing large RAMs, data paths responsible for RAM access

sometimes becomes the most difficult to meet timing. Therefore, it is important to

guarantee that boundaries of RAMs are defect-clean so that the data-paths interfacing to

their pins can satisfy timing. Due to the relatively complex nature of RAM access

compared to operating flops, ATPG tools make use of a different algorithm called multi-

load ATPG to test RAM interfaces. A flow has been developed to test RAM interfaces

at-speed based on the multi-load algorithm. This flow is designed to target DIN, DOUT,

and ADDR pins of RAMs.

With the AC-scan pattern generation flow fully implemented, scan-inserted

devices can be tested in at-speed fashion. It is recommended that chips be subjected to

the transition-delay patterns, along with the DC top-up patterns, first to filter out all

defective parts before running the path-delay patterns.

BIBLIOGRAPHY

Kenneth D. Wagner. Robust Scan-based logic test in VDSM technologies. Computer, Volume 32, lssue 11, Nov. 1999, pp. 66 - 74

A. Kobyashi, S. Matsue, H. Shiba. Flip-Flop Circuits with Fault Location Test Capability. Proc. IECEJ national Convetional, 1 968, p. 962

Jan M. Rabaey. Digital lntergrated Circuits. 1996, Section 11.6, p. 685

Alfred L. Crouch, John C. Potter, Jason Doege. AC Scan Path Selection for Physical Debugging. Design & Test of Computers, IEEE, Volume 20, lssue 5, Sept.-Oct. 2003, pp. 34 - 40

G. Aldrich and B. Cory. Improving Test Quality and Reducing Escapes. Proc. Fabless Forum, Fabless Semiconductor Assoc., 2003, pp. 34-35.

Mukunk Sivaraman, Andrzej J. Strojwas. Path Delay Fault Diagnosis and Coverage-A Metric and an Estimation Technique. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Volume 20, lssue 3, March 2001, pp. 440 - 457

Xijiang Lin, Ron Press, Janusz Rajski, Paul Reuter, Thomas Rinderknecht, Bruce Swanson, Nagesh Tamarapalli. High-Frequency, At-Speed Scan Testing. Design & Test of Computers, IEEE, Volume 20, lssue 5, Sept.-Oct. 2003, Pp. 17 - 25

A. Carbine, D. Feltham. Pentium Pro Processor Design for Test and Debug. Proc. IEEE International Test Conference, IEEE Computer Society Press, Los Alamitos, Calif., 1997, pp. 294-303.

Kenneth B. Butler. Estimating Economic Benefits of DFT. Design & Test of Computers, IEEE, Volume 16, lssue 1, Jan.-March 1999, pp. 71 - 79

S. Pateras. Achieving At-Speed Structural Test. Design & Test of Computers, IEEE, Volume 20, lssue 5, Sept.-Oct. 2003, pp. 26 - 33

R. Tekumalla, P. Menon. On Redundant Path Delay Faults in Synchronous Sequential Circuits. Computers, IEEE Transactions on, Volume 49, lssue 3, March 2000, pp. 277 - 282

A. Keshavarzi, J. Tschanz, S. Narendra, V. De, W. Daasch, K. Roy, M. Sachdev, C. Hawkins. Leakage and process variation effects in current testing on future CMOS circuits. Design & Test of Computers, IEEE, Volume 19, lssue 5, Sept.- Oct. 2002, Pp. 36 - 43

[13] B. Garlepp, K. Donnelly, Jun Kim, P. Chau, J. Zerbe, C. Huang, C. Tran, C. Portmann, D. Stark, Yiu-Fai Chan, T. Lee, M. Horowitz. A Portable High-speed DLL for CMOS Interface Circuits. Solid-state Circuits, IEEE, Volume 34, lssue 5, May 1 999, Pp. 632 - 644

[14] Bruce D. Cory, Rohit Kapur, Bill Underwoods. Speed Binning with Path Delay Test in 1 50-nm Technology. Design & Test of Computers, I E E E , Volume 20, lssue 5, Sept.-Oct. 2003, pp. 41 - 45

[ I 51 Mentor Graphics Inc. Design-for-Test Datasheet: ATPG with Embedded Compression. Retrieved December 22, 2004, from http://www.mentor.com/products/dft/atpg~compression/testkompress/loader.cfm? url=/commonspot/security/getfile.cfm&pageid=l5343

At-speed scan insertion and automatic test pattern...

Documents

Transcript of At-speed scan insertion and automatic test pattern...