Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

8
Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan Stephen Sunter and Aubin Roy Mentor Graphics h MANY MIXED-SIGNAL AND flash memory circuits have inherently long time constants which impose fundamental limits on how much the test time for integrated circuits (ICs) can be reduced when test- ing one IC at a time. Industry generally uses multisite testing to reduce the effective test time for these ICs. To facilitate multisite testing without greatly increas- ing the number of automatic test equipment (ATE) channels needed, many of each IC’s pins are not contacted by the tester. By making the uncontacted pins bidirectional, these pins are tested via bound- ary scan for basic shorts and opens. In our ITC’10 and ITC’11 papers [1], [2], a built-in self-test for input/output pins (I/O BIST) was de- scribed that tests various delays of I/O pins via standard 1149.1 boundary scan [3]. No changes are needed in the boundary scan cells if they are similar to the BC_7 cells specified in the standard. And no changes are needed in the I/O cells. As we will de- scribe later, the delay mea- surements are made without using delay circuit- ry (which would be pro- cess sensitive), by using a phase-locked loop (PLL) to generate a slightly asyn- chronous sampling fre- quency and applying it to the capture latches of the boundary scan cells. Testing delays of low-speed I/Os (G 50 MHz) is generally considered unnecessary, but higher speed I/Os, like those used for double data rate (DDR) memory access, have important timing re- quirements and more circuitry susceptible to pro- cess variations. Adding delay measurement circuitry within these I/O cells would risk altering their silicon-characterized ac, dc, or electrostatic dis- charge (ESD) characteristics. Fortunately for our BIST approach, many DDR interfaces already have boundary scan. The sampling instant for DDR inputs is usually controlled by a delay-locked loop (DLL) that gener- ates a clock edge midway between received data edges. Jitter on DDR pins is sometimes tested by a BIST that drives the I/O pads with pseudorandom data while capturing the pad data with increasingly uncentered DLL clock edges and detecting any resulting differences in the captured data. But the Editor’s notes: The performance of an IC’s inputs and outputs (I/Os) is always specified in IC datasheets and is the performance most likely to be affected by assembly steps. As the speed and number of I/Os increase beyond low-cost ATE capabilities, and I/O pads become smaller (less than 10 microns wide for 3D assemblies), built-in self-test (BIST) of this performance becomes more attractive. This article describes a BIST that exploits relatively low-speed IEEE 1149.1 boundary scan to access the I/Os and test performance with as low as 5 ps calibrated resolution, equivalent to a bandwidth approaching 100 GHz. VShawn Blanton, Carnegie Mellon University 0740-7475/12/$31.00 B 2012 IEEE September/October 2012 Copublished by the IEEE CEDA, IEEE CASS, IEEE SSCS, and TTTC 55 Digital Object Identifier 10.1109/MDT.2012.2206363 Date of publication: 28 June 2012; date of current version: 23 October 2012.

Transcript of Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

Page 1: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

Contactless Test ofIC Pads, Pins, and TSVsvia Standard Boundary ScanStephen Sunter and Aubin Roy

Mentor Graphics

h MANY MIXED-SIGNAL AND flash memory circuits

have inherently long time constants which impose

fundamental limits on how much the test time for

integrated circuits (ICs) can be reduced when test-

ing one IC at a time. Industry generally uses multisite

testing to reduce the effective test time for these ICs.

To facilitate multisite testing without greatly increas-

ing the number of automatic test equipment (ATE)

channels needed, many of each IC’s pins are not

contacted by the tester. By making the uncontacted

pins bidirectional, these pins are tested via bound-

ary scan for basic shorts and opens.

In our ITC’10 and ITC’11 papers [1], [2], a built-in

self-test for input/output pins (I/O BIST) was de-

scribed that tests various delays of I/O pins via

standard 1149.1 boundary scan [3]. No changes are

needed in the boundary

scan cells if they are similar

to the BC_7 cells specified

in the standard. And no

changes are needed in the

I/O cells. As we will de-

scribe later, the delay mea-

su r emen t s a r e made

without using delay circuit-

ry (which would be pro-

cess sensitive), by using a

phase-locked loop (PLL) to

generate a slightly asyn-

chronous sampling fre-

quency and applying it to the capture latches of

the boundary scan cells.

Testing delays of low-speed I/Os (G 50 MHz) is

generally considered unnecessary, but higher speed

I/Os, like those used for double data rate (DDR)

memory access, have important timing re-

quirements and more circuitry susceptible to pro-

cess variations. Adding delay measurement circuitry

within these I/O cells would risk altering their

silicon-characterized ac, dc, or electrostatic dis-

charge (ESD) characteristics. Fortunately for our

BIST approach, many DDR interfaces already have

boundary scan.

The sampling instant for DDR inputs is usually

controlled by a delay-locked loop (DLL) that gener-

ates a clock edge midway between received data

edges. Jitter on DDR pins is sometimes tested by a

BIST that drives the I/O pads with pseudorandom

data while capturing the pad data with increasingly

uncentered DLL clock edges and detecting any

resulting differences in the captured data. But the

Editor’s notes:The performance of an IC’s inputs and outputs (I/Os) is always specified inIC datasheets and is the performance most likely to be affected by assemblysteps. As the speed and number of I/Os increase beyond low-cost ATEcapabilities, and I/O pads become smaller (less than 10 microns wide for3D assemblies), built-in self-test (BIST) of this performance becomes moreattractive. This article describes a BIST that exploits relatively low-speedIEEE 1149.1 boundary scan to access the I/Os and test performance with aslow as 5 ps calibrated resolution, equivalent to a bandwidth approaching100 GHz.

VShawn Blanton, Carnegie Mellon University

0740-7475/12/$31.00 B 2012 IEEESeptember/October 2012 Copublished by the IEEE CEDA, IEEE CASS, IEEE SSCS, and TTTC 55

Digital Object Identifier 10.1109/MDT.2012.2206363

Date of publication: 28 June 2012; date of current version:

23 October 2012.

Page 2: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

timing of the DLL is untested, uncalibrated, and re-

latively coarse.

For many applications, matching between I/O

delays is more important than whether each pin’s

performance meets absolute specifications. In DDR

interfaces, each I/O within a group of pins is

designed to have performance identical to the other

I/Os. Nevertheless, process variations create interpin

differences in duty cycle, clock-to-Q delay, and slew

rate, which cause timing skew. All published BIST

techniques that we are aware of, including [4]–[7],

do not measure pin-to-pin performance matching,

though [5] provides pass/fail testing.

Many time-to-digital converters (TDCs) are de-

scribed in the literature, most of them based on

Vernier delay-line or oscillator techniques, or

combining them [8], aimed at measuring one-shot

events, such as phase delay of each clock cycle of a

phase-lock loop (PLL). None of these techniques is

suitable for measuring I/O delays without connect-

ing additional circuitry to each I/O.

The development of three-dimensional (3D) de-

vices containing stacked ICs, or an intermediate

form designated 2.5D and containing ICs on a

silicon substrate ‘‘interposer’’, has created an in-

teresting test challenge: they are connected by

through-silicon vias (TSVs) instead of printed wires.

The ICs may contain thousands of TSVs which,

compared to conventional bond pad I/Os, are more

numerous, much smaller, and driven by weaker I/O

circuitry, so probing these I/Os risks damaging or

overloading them and requires many more ATE

channels. Furthermore, the quality of connections

between the ICs is presently a significant yield

limiter for the assemblies, so an on-chip solution is

needed for both prebond and postbond testing of

TSV quality.

Measuring I/O delays viaboundary scan

Our technique is based on the ‘‘I/O wrap delay’’

measurement technique first described in [4] which

used a precision-delayed strobe pulse generated by

ATE or a proposed on-chip delay line; the strobe

pulse was delivered to a capture latch with increas-

ing delays until the captured result changed.

In our case, the clock for the Capture latches in

the boundary scan register (BSR), as shown in

Figure 1, is derived from an asynchronous but co-

herent frequency, relative to the frequency used for

the Update latches and core logic of the IC. The ratio

between the frequencies of the ‘‘Async’’ clock and

the ‘‘Ref’’ clock is made equal to (N-1)/N, where N is

an integer typically between 10 and 4000. The test

access port (TAP) clock, TCK, is not used for this

sampling because typically its frequency is not

constant and may have considerable jitter.

The technique’s measurement resolution TRES is

equal to the Ref clock’s period divided by N � 1.

Figure 2 shows waveforms for an example 3-bit BSR

(LBSR is its length, in bits), when N ¼ 6.

The BIST and BSR are initialized and loaded via

the 1149.1 TAP at TCK clock rates, and a delay path

in each I/O cell, as shown in Figure 1, is selected via

the mode_2 control. The BIST’s Ref and Async clock

inputs are then enabled synchronously. At least one

of these clocks must be generated by a PLL, which

may be on-chip or off-chip.

After initialization, which occurs when the Ref

and Async clock edges align, the BSR is clocked

continuously, and the first

Async clock edge chosen for a

Capture occurs in advance of

the Update edge by up to one

half clock periodVthis ensures

that up to one half clock period

of skew in clockDR relative to

updateDR can be tolerated. The

captured data is then shifted to

the BIST circuitry (while data for

the next update is shifted in).

The BIST module has multi-

ple ‘‘measurement slices’’ con-

taining counters and a register,

each slice’s register loaded withFigure 1. Bidirectional I/O, indicating pad, and reference delay paths.

IEEE Design & Test of Computers56

ITC Special Section

Page 3: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

a different I/O address (bit position in the BSR). As

boundary scan data is shifted out, a master counter

is incremented and when its count equals that of

one of the slice address registers, the measurement

counter in that slice is incremented if an edge has

been detected earlier in sampled data at that ad-

dress. Simultaneously, the data shifted into the BSR

for the associated I/O is its previous output logic

value inverted. Boundary scan data for all unin-

volved I/Os is simply recirculated so that they are

stable.

Uniquely, this I/O BIST can be placed into a de-

sign after function design, most layout, and even

timing closure, is complete. As shown in Figure 3, a

multiplexer in the BSR design enables the I/O BIST to

substitute new BSR clocks, modes, and data, whose

timing needs to be verified only within the BIST

module.

Figure 3. BIST connection to 1149.1 TAP, BSR, and system PLL.

Figure 2. Waveforms for N ¼ 6 and LBSR ¼ 3, when sampling midrange.

September/October 2012 57

Page 4: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

The BIST measures time intervals or delay dif-

ferences, from one edge to another for the same

signal, or from a reference edge to a resulting signal

edge for one path versus another (e.g., Reference

path versus Pad path in Figure 1), or from a ref-

erence edge to a resulting signal edge for one con-

dition versus another (e.g., high versus low drive).

Measurements are performed by capturing all

output values N times, uniformly spanning the

measurement range (one clock period) for one

path or condition, and again for a second path or

condition. The measurement slices count from

when their pin’s captured data rises/falls for one

path/condition to the time it rises/falls for a second

path/condition.

For most measurements, the time to test each

group of I/Os is calculated as

TTEST � 4� RoundupLBSR

N

� �� N 2TREF � R (1)

where TREF is the Ref clock period, and R is the

measurement range in clock periods. For a 40-ns

period, 100-ps resolution ðN ¼ 400Þ, LBSRG400, andR ¼ 1, test time is 26 ms per group of I/Os; for

LBSR ¼ 400 � 800, it is 52 ms.

The total test time depends on the total number

of pins whose delays are to be tested. For the case

above where TTEST ¼ 52 ms, if the number of mea-

surement slices was 16 and the total number of pins

to be measured was ten times that number, then

total test time would be 10� 52 ¼ 520 ms.

The number of nand2-equivalent gates in the

BIST circuitry is approximately 700 per measure-

ment slice, plus 2000 for common circuitry. The

number of slices can be optimized for test time and

silicon area, and hence, test cost.

Improving range, resolution, andtest time

One way to reduce test time is to increase the

reference frequency, since that reduces both the

range and the time to shift out the results, hence

the N 2 term in (1). However, if the I/O path delay

(measured from the TAP controller updateDR edge

to the transition-capturing clockDR edge) is greater

than the usable range, its delay cannot be measur-

edVit will be ‘‘out of range.’’ For zero clock skew

(relative to nominal Update to Capture delay, which

is always an odd number of half clock-periods), the

usable measurement range is a half clock-period.

To increase the BSR shift frequency without re-

ducing the delay measurement range, we permit the

number of captures per path/condition to span mul-

tiple clock periods, i.e., R ¼ 2 in (1). This permits

the clock frequency to be doubled, which reduces

test time yet increases the usable range. For exam-

ple, at 25 MHz (40 ns period) and R ¼ 1, the usable

range for less than �5 ns clock skew is 15� 25 ns.

Doubling the frequency to 50 MHz (while maintain-

ing �5 ns clock skew) and doubling the span to two

clock periods ðR ¼ 2Þ means the new range follow-

ing the update edge becomes 25� 35 ns, yet test

time is reduced 50%.

The BIST’s delaymeasurementsmentioned thus far

include rise delay and fall delay for one path or

condition versus another, but other differences can be

measured. Outside the measurement slices, we

provided two additional counters so that while each

slice’s rise and fall counters are being incremented,

the increment signals of all slices are ORed together

to increment an ‘‘average rise’’ counter and an

‘‘average fall’’ counter. Comparing pin delays to these

averages can be quite useful, as we shall see next.

Comparing measurements to test limitsIn practice, the BIST’s operation always includes

two major steps: a measurement followed by shifting

out all measurement counter contents (for charac-

terization), or a measurement followed by compar-

ison to shifted-in test limits and shifting out pass/fail

bits (for production test).

If the delay counts are to be compared to test

limits, then after measuring a group of pin delays

simultaneously, the rise and fall delay counters of all

the measurement slices are concatenated into a

single circular shift register (the two average count-

ers are concatenated into a separate circular shift

register). Then, as lower or upper per-pin test limits

are shifted in via the TDI pin of the TAP, and the

concatenated counters are shifted, each limit is

compared to each count to produce a pass/fail bit.

Optionally, as all the counts are shifted serially

past a single subtract/compare unit, every adjacent

pair of counts is subtracted from each other and

each result compared to a respective test limit. For

example, each I/O’s fall count can be subtracted

from its rise count to produce a rise-fall mismatch.

Each rise (and fall) count can be subtracted from

IEEE Design & Test of Computers58

ITC Special Section

Page 5: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

the neighboring slice’s rise (and fall) count to pro-

duce a pin-to-pin rise (and fall) mismatch count.

Note that each slice can be programmed with any

I/O’s address so delays of any two pins can be

compared. Also, each rise (and fall) count can be

subtracted from the rise (and fall) count of a

selected reference I/O. Lastly, each rise (and fall)

count can be subtracted from the average rise (and

fall) count.

This last capability can provide adaptive BIST at

the subdie level, as forecast in the ITRS 2009 road-

map [9], wherein test limits are adjusted based on

average measurements for structures within the

same die. Since our I/O BIST’s test limits can be re-

lative to the average delay, the absolute limits are in

effect adjusted dynamically as each IC is testedV

similar to dynamic part average testing (DPAT), ex-

cept that this BIST allows simpler tests since no

computations are needed in the ATE. The applied

patterns contain binary-coded test limits for the

difference between the individual measurements

and each I/O group average.

The advantage of adaptive testing at the subdie

level is that every measurement is treated equally,

including the first I/O group of the first die on the

first wafer in the first lot. Defects that affect a single

path are more detectable with a subdie approach

because each path can be compared to paths hav-

ing more-similar process, voltage, and temperature

than other dies. Note, however, that our approach

compares each measurement to a mean, assuming a

constant standard deviation (sigma), whereas DPAT

may use a dynamic sigma.

Instead of adaptive testing, testing pin-to-pin va-

riations can be used to simply detect delay faults

that are within normal die-to-die process variations

but excessive considering that the I/O cells and

loading were designed identically. Differences in

output loading at the board level, or in 3D devices,

can cause detectable delay differences. This is

clearly important for high-speed parallel data bus-

ses, such as DDR interfaces, but it is also important

for diagnosing TSV quality. A resistive void in a TSV

can decrease the capacitive loading seen by the

TSV’s driver, and hence its I/O wrap delay. A leakage

path in the TSV can be detected via boundary scan

as described in [6].

Another small delay difference of interest occurs

when a pin’s transition affects that of neighboring

pinsVcrosstalk. Our BIST measures this by chang-

ing the BSR contents between the first and second

phases of the I/O delay measurement, so that neigh-

boring pins toggle in the same direction during

phase 1 of the measurement, and opposite to each

other in phase 2 of the measurement. Test limits are

applied to the change in each pin’s measured delay,

for both the rising edge and the falling edge. Test

limits can also be applied to the mismatch between

I/Os for this delay change.

Measuring high-speed signal timingThe delay measurements described thus far use

the Update latch in boundary scan cells to provide

the stimulus. Each time the BSR is reloaded, the data

for the Update latch is inverted relative to its pre-

vious value. As described in [1], this allows many

timing properties of the I/O cells to be measured,

and with time resolution limited only by the fre-

quency resolution of the PLL that provides the Async

clock.

For I/Os that deliver signals at frequencies higher

than the boundary scan rate, for example DDR data

at 1.6 Gb/s, it will usually be more accurate to mea-

sure timing parameters while data is delivered to the

I/O from the core function logic instead of from the

boundary scan logic.

To perform this type of measurement, the core

logic must generate alternating data (1010. . .) at a

rate synchronous to the BIST’s Ref clock. This may

be accomplished by inserting a toggle flip-flop in

test mode, or combinational logic that applies a

constant 0 to even bits and 1 to odd bits of the pa-

rallel data that is converted to serial DDR data.

For example, if the BSR has a 20 ns period

(50 MHz) Ref clock and a 20.025 ns period Async

clock, then the DDR logic can deliver data at

1.6 Gb/s, and the BIST’s measurement resolution

will be 25 ps. The boundary scan logic samples the

high-speed data using selected edges of the Aync

clock and shifts samples to the BIST at 50 MHz. The

BSR mode_1 signal (see Figure 1) selects the high-

speed signal from the core, and the mode_2 signal

selects whether it is sampled before or after the I/O

driver.

The measurement range can be reduced to

span two periods of the high-speed signal, which

ensures that it spans at least one rising edge fol-

lowed by a falling edge. This reduces R and test

time proportionally.

September/October 2012 59

Page 6: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

Duty cycle distortion (DCD)Total DCD for an I/O signal is equal to mismatch

in output pad rise and fall times plus DCD in the data

signal from the core logic. To measure DCD in the

core data signal, the BIST samples the signal with the

BSR Capture latches (mode 1 ¼ 0 and mode 2 ¼ 1)

and counts the interval from each first detected

rising edge to the next detected falling edge, in units

of the measurement resolution (e.g., there should be

25 consecutive ‘‘1’’ samples for 1.6 Gb/s data, when

sampled with 25 ps resolution). This count is com-

pared directly to test limits.

The detected rising edge will be affected by jitter

in the clocks and power rail noise, but the BIST

computes median edge positions [1] so that it can

tolerate peak-to-peak jitter as large as six times the

measurement resolution.

Slew rateOutput slew rate can be tested by measuring the

captured edge time for a transition through the I/O

pad at two different receiver threshold voltages. The

input threshold for DDR inputs is controlled by a

termination voltage ðVTÞ and it could be offset by,

say, 50 mV between the first and second half of the

delay measurement; the voltage change divided by

the difference between the two delays is equal to the

slew rate. During a production test, this delay differ-

ence would be compared to test limits, e.g., a 50 mV

offset would produce 25 ps delay difference for a

2 V/ns slew rate.

Clock-to-output delayMeasuring the delay from a clock input pin to a

data output pin requires sampling at the two pins

with zero sampling skew, which is not generally

practical within an IC. But if signals for a small num-

ber of I/Os can be captured by Capture latches

clocked by a single clock buffer, then the delay from

an edge of one of the signals to an edge of another of

the signals may be measured with sufficient accuracy.

To measure delay between edges of different sig-

nals occurring nearly simultaneously on I/Os spaced

far apart, delay is measured relative to an arbitrary

phase referenceVin our case it is when edges of

the Ref and Async clocks align. The delay mea-

sured for one I/O is subtracted from the delay mea-

sured for a reference I/O. As long as the reference

I/O is sampled similarly to the other I/Os (i.e., has a

similar boundary scan cell), regardless of whether

it is an input or output clock, the difference can be

measured.

SkewTiming skew is the maximum difference in clock-

to-output delay measured between any two output

pins within a defined group of pins. In [5], timing

skew is measured by continually incrementing the

delay of the sampling strobe pulse and counting the

number of increments between the strobe time

when the first output fails and the strobe time when

the last output fails.

To measure skew with our BIST, each measure-

ment slice starts counting samples when a first edge

is detected for its respective pin, and does not stop

counting until the test is complete. The two counters

that measure average count are modified so that

each starts counting when the first delay count in-

crement pulse (for rise, and fall) is detected in any

measurement slice, and does not stop counting until

the test is completeVthis produces a count for the

first edge on any pin (for rise, and fall) instead of an

average count. Then the difference between each

I/O’s delay count and the ‘‘first edge on any pin’’

count is compared to test limits as described earlier

for the average count.

Hardware resultsWe used an Altera Stratix II FPGA to implement

the I/O BIST, and synthesized from RTL with no lay-

out guidance other than typical clock timing skew

constraints. The boundary scan cells were syn-

thesized in the logic fabric of the FPGA (the FPGA’s

hard-wired boundary scan could not be used be-

cause its clocks, data, and control signals are not

accessible). We measured delays for four I/O pins

that were adjacent to one another on the die and in

the package but not connected to the board. We

chose these pins to emulate contactless testing yet

maximize crosstalk. We configured the FPGA with

the four outputs programmed for 8 mA drive and

then for 4 mA drive. Detailed results are given in [2]

so only a summary is provided here.

The TAP interface was implementedwith general-

purpose FPGA I/O pins, and connected to an off-

the-shelf USB-to-JTAG module connected to a

desktop PC. The TCK clock rate was limited by the

speed of the PC software, and was approximately

2 MHz. Two differential reference clocks were gener-

ated by two telecom-quality PLLs: one provided

IEEE Design & Test of Computers60

ITC Special Section

Page 7: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

200 MHz to the FPGA and 25 MHz to the other PLL,

which in turn provided 24.9875 MHz to the FPGA.

These PLLs deliver clocks with less than 0.5 ps rms

jitter, which is less than a typical on-chip PLL but

they allowed us to explore the limits of the BIST

technique’s timing resolution. Within the FPGA, the

200 MHz was supplied as a high-speed core ‘‘data’’

signal to the boundary scan cells, and it was divided

by 8 to provide a 25 MHz Ref clock to the BIST.

Hence, during measurements, the Ref clock period

was 40.00 ns, the Async clock period was 40.02 ns,

and the sampling resolution was the difference,

20 ps.

I/O wrap delayOur hardware measurements showed that I/O

wrap propagation delays through 8 mA output driv-

ers, pad, and input receiver, decreased by an aver-

age of 719 ps, or about 10%, when the drive was

decreased to 4 mA (the I/Os are not connected to

board wiring).

The 8 mA drive delays also decreased by an

average of 24 ps when adjacent pins had the same

transitions, and by 12 ps when adjacent pins had

opposite transitions, whereas the 4 mA drive delays

were unaffected (1 � 6 ps average difference). The

delay dependence on adjacent pin transitions will

appear as crosstalk-induced jitter in function mode.

In experiments with other 8 mA I/O pins connected

to approximately 10 cm of adjacent board wiring, we

measured almost 100 ps delay decrease for same

transitions on adjacent pins, and 100 ps delay in-

crease for opposite transitions.

Duty cycleMode_1 of the boundary scan cells was set to

logic 0 to select the core signal (200 MHz clock sup-

plied differentially to the FPGA from off-chip PLL).

The positive and negative pulse widths were mea-

sured simultaneously, starting from the first detected

target edge-type to the next opposite edge-type. The

sum of the two pulse width measurements for each

pin provided a checksum that should equal the

period of the 200 MHz clock (and did, within 36 ps

or 0.7%), since the measurements are independent.

For the 200 MHz clock as data signal, we mea-

sured 68 and 182 ps positive-negative pulse width

mismatch at the two different pins, corresponding to

0.7% or 1.8% DCD, respectively. Clock buffers and

logic gates within the FPGA likely altered the duty

cycle, so this amount of DCD seems reasonable.

We used only 200 MHz due to practical limita-

tions of conveying higher frequencies into the FPGA,

but nothing in the technique prevents measuring

DCD for data rates up to about 4 Gb/s with better

than 1% resolution.

DiscussionThe I/O wrap measurement results for a path

that included only a multiplexer, pad driver, pad

receiver, and another multiplexer, show that the

delay varied less than 25 ps when there was cross-

talk from adjacent pins but by 200 ps when con-

nected to 10 cm of adjacent wiring. We concluded

that a delay test of an uncontacted pin would not

reliably distinguish 4 from 8 mA driveVa load capa-

citance is needed.

However, for ICs intended for 3D applications,

output drive is much less than for stand-alone ICs, to

save power and area, so the I/O wrap delay will be

more sensitive to changes in drive or loading. For

example, for 0.5 mA output drive and 1 volt signal

swing, an open defect in a TSV that causes the out-

put capacitance to change from 100 fF to zero will

cause a 100 ps reduction in delay, which could be

reliably detected by our I/O BIST.

The BIST’s measurement technique allows arbi-

trary and almost unlimited timing resolution be-

cause the resolution is set by the ratio between the

Ref and Async frequencies supplied to the BIST.

On the FPGA, we verified resolutions ranging from

6.4 ns to 5 ps.

When a DLL is used to measure delays, the delay

resolution is equal to the clock period (to which the

delay loop is locked) divided by the number of

inverters in the loop. If two DLLs are used, to imple-

ment Vernier delay lines, then finer resolution can

be achieved, but only for a range of one clock period.

When a PLL is used, as in our BIST, the frequency

resolution is independent of the number of inverters in

its voltage-controlled oscillator. Instead, resolution

depends on the feedback binary divider, which is

typically 6 � 10 bits, providing up to 1,024 steps of

resolution. PLLs that use an LC-tank oscillator can

achieve resolution up to 28 bits, which provides more

than a million steps of resolution per clock period.

This BIST was recently implemented in an IC

containing DDR3 interfaces, for use in a 3D device

containing many of the ICs. The delay of most interest

September/October 2012 61

Page 8: Contactless Test of IC Pads, Pins, and TSVs via Standard Boundary Scan

was between ICs; our I/O BIST provides two ways to

measure these delays. A system clock was provided

to all ICs with near-zero skew, and within each IC the

BSR capture clock was manually routed to provide

near-zero skew sampling of the DDR data signals and

the system clock. This arrangement allows the delay

of each DDR I/O signal to be measured relative to the

system reference clock. In simulations, measured

output delay for the transmitting IC plus the

measured input delay for the receiving IC, for any

signal, equaled the signal’s delay between the ICs.

A MORE PRACTICAL way to detect excessive delays

in signal paths between ICs is characterization of the

incoming signal skew, as detected by a capture clock

having constant but unknown skew. Our BIST allows

every I/O signal’s apparent arrival time in an IC

within a 3D device to be measured and character-

ized automatically so that individual upper and

lower test limits can be set for every signal. This

approach is quite general and could be used to test

signal paths between ICs in any system.

h References[1] S. Sunter and M. Tilmann, ‘‘BIST of I/O circuit

parameters via standard boundary scan,’’ in Proc.

ITC, Nov. 2010, pp. 529–551.

[2] S. Sunter and A. Roy, ‘‘Adaptive parametric BIST

of high-speed parallel I/Os via standard boundary

scan,’’ in Proc. ITC, Sep. 2011.

[3] IEEE Standard Test Access Port and Boundary-Scan

Architecture, IEEE Std. 1149.1-2001, The IEEE, Inc.,

New York.

[4] P. Gillis, F. Woytowich, K. McCauley, and U. Baur,

‘‘Delay test of chip I/Os using LSSD boundary

scan,’’ in Proc. ITC, Oct. 1998, pp. 83–90.

[5] M. Tripp, T. M. Mak, and A. Meixner, ‘‘Elimination

of traditional functional testing of interface timings

at Intel,’’ in Proc. of ITC, Oct. 2003.

[6] S. Sunter, C. McDonald, and G. Danialy, ‘‘Contactless

digital testing of IC pin leakage currents,’’ in Proc. ITC,

Oct. 2001, pp. 204–10.

[7] N. Vijayaraghavan, B. Singh, S. Singh, and

V. Srivastava, ‘‘Novel architecture for on-chip

AC characterization of I/Os,’’ in Proc. ITC

Oct. 2006.

[8] J. Yu, F. Dai, and R. Jaeger, ‘‘A 12-bit Vernier

ring time-to-digital converter in 0.13 �m CMOS

technology,’’ J. Solid-State Circuits, vol. 45, no. 4,

Apr. 2010.

[9] International Technology Roadmap for Semiconductors,

2009 Edition, Test and Test Equipment, ITRS 2009.

[Online]. Available: http://www.itrs.net/Links/2009ITRS/

2009Chapters_2009Tables/2009_Test.pdf.

Stephen Sunter is Mentor Graphics’ engineeringdirector of mixed-signal DFT, in Ottawa, Canada,where he has focused on the development of DFTandBIST for analog/mixed-signal ICs for over 15 years.He received the BASc degree in electrical engineeringfrom the University of Waterloo. He is a senior memberof the IEEE.

Aubin Roy is a staff engineer at Mentor Graphicsand has been responsible for the development ofmixed-signal DFT products for more than 15 years.He received the BASc degree in electrical engineer-ing from University of Sherbrooke (Canada). He is amember of the IEEE and ACM.

h Direct questions and comments about this articleto Stephen Sunter, Mentor Graphics.

IEEE Design & Test of Computers62

ITC Special Section