[IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007...

8
Paper 4.3 INTERNATIONAL TEST CONFERENCE 1 1-4244-1128-9/07/$25.00 © 2007 IEEE A Self-Testing BOST for High-Frequency PLLs, DLLs, and SerDes Stephen Sunter & Aubin Roy LogicVision (Canada), Inc. Ottawa, Canada Abstract A built-off self-test (BOST) approach is described in which a self-testing FPGA on a device-under-test (DUT) interface board can test serializers or deserializers at serial rates up to 6.4 Gbps, and some SerDes at much higher rates. With a few on-chip undersampling latches, the same BOST can also test embedded PLLs, DLLs, clocks, and logic delays. For SerDes functions, the structurally tested parameters include jitter and its constituent components, jitter tolerance, and waveshape. For other embedded functions, the tested parameters include jitter, duty cycle, frequency, phase error, and delay. The hardware results show that picosecond resolution has been achieved, with minimal or no changes to the DUT. Introduction In 1999, Credence Systems Corp. introduced a concept they called “BOST”: “PLLBOST is a method for measuring frequency and jitter that is used to test PLL clock generation blocks on complex devices, such as graphics controller devices, without specialized instrumentation. The test, or built-in self-test (BIST) circuit, is implemented into [a] field programmable gate array (FPGA) that is placed off the chip on the test fixture, hence BOST”[1]. Of course, the concept of adding active circuitry to a test fixture had been used for many years prior to that announcement, to permit low performance ATE to test high performance ICs, so BOST only changed that paradigm by being described as a “BIST” solution that was built “off” instead of “in” the device-under-test (DUT). Typically, an FPGA on a DUT board can generate a digital stimulus faster than a low cost ATE can, or analyze digital responses faster. An FPGA on a loadboard or within a test-head permits tremendous flexibility because its circuitry can be configured from an on-board flash memory or from the tester’s general memory prior to each production run (FPGA configuration typically requires 0.25~2 seconds and millions of programming bits, making it impractical to perform prior to testing each device). Traditionally, FPGA performance has lagged that of ASICs because FPGAs must permit re-configuration of their logic. In the last two years, however, FPGAs have become available with >6 gigabit per second (Gbps) serializer/ deserializer (SerDes) functions, which is faster than mainstream IC performance of 0.5~3.2 Gbps. With these new FPGAs, serial data rates above 6 Gbps can be generated and captured [2][3], making this look like a much lower cost alternative to purchasing ATE intended for these data rates. However, there are important limitations to using SerDes FPGAs to test high-speed ICs. One important problem for any active circuitry on a loadboard is its susceptibility to mishandling and electro- static discharge (ESD) during the constant carrying, installation, and removal procedures that are common on a production test floor. The mishandling may damage the circuitry without being clearly evident and may cause an expensive decrease in yield or quality before the problem location is identified. ATE channels are not handled loosely, and they are periodically self-tested and calibrated by the ATE itself. A limitation of using an FPGA as the active circuitry is its limited ability to adjust its input and output timing (even though some have DLLs for some delay paths): typical logic block delays in the latest FPGAs are 30~50 ps so this becomes the minimum controllable time increment for a delay path. Some researchers have used Vernier ring oscillators whose oscillation frequencies are adjusted by adding/removing gates from the ring, but 55 ps is the finest resolution reported [4]. At multi-Gbps rates, <1 ps resolution is needed. One solution is to use a mixed-signal IC that incorporates custom, differential layout to permit finer delay resolution. Besides requiring a custom IC, the measurement circuitry becomes more distant from the embedded circuit block under test, resulting in an access path that adds jitter or makes the design of the DUT and the interface board so complex and engineering-intensive that time-to-market and development cost are increased intolerably. This paper presents an FPGA-based BOST approach that can test its own performance, including microampere pin leakage and sub-picosecond jitter, in less than 20 ms just prior to testing each DUT to detect any pin damage caused by temporary short circuits, over-voltage, or ESD. Using recently available FPGAs having up to twenty 6.4 Gbps SerDes channels and hundreds of differential I/O pins, ICs with a wide range of frequencies and performance can be tested with any ATE, or a laptop PC connected to only the JTAG pins of the FPGA. We also show how a few dozen extra logic gates on the DUT can provide nearly jitter-free access to embedded circuit blocks. Section 2 of this paper summarizes three published digital BIST techniques that test IC pin leakages, high frequency (HF) jitter and jitter tolerance in SerDes transceivers, and low frequency (LF) and HF jitter in embedded PLLs. Section 3 shows how these circuit techniques have been implemented in an FPGA, and why we chose a particular FPGA. Section 4 describes a typical BOST implementation, including circuitry on the DUT board, and capabilities of this implementation. Section 5 provides some test results and discusses their significance.

Transcript of [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007...

Page 1: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 1 1-4244-1128-9/07/$25.00 © 2007 IEEE

A Self-Testing BOST for High-Frequency PLLs, DLLs, and SerDes

Stephen Sunter & Aubin Roy LogicVision (Canada), Inc. Ottawa, Canada

Abstract A built-off self-test (BOST) approach is described in which a self-testing FPGA on a device-under-test (DUT) interface board can test serializers or deserializers at serial rates up to 6.4 Gbps, and some SerDes at much higher rates. With a few on-chip undersampling latches, the same BOST can also test embedded PLLs, DLLs, clocks, and logic delays. For SerDes functions, the structurally tested parameters include jitter and its constituent components, jitter tolerance, and waveshape. For other embedded functions, the tested parameters include jitter, duty cycle, frequency, phase error, and delay. The hardware results show that picosecond resolution has been achieved, with minimal or no changes to the DUT. Introduction

In 1999, Credence Systems Corp. introduced a concept they called “BOST”:

“PLLBOST is a method for measuring frequency and jitter that is used to test PLL clock generation blocks on complex devices, such as graphics controller devices, without specialized instrumentation. The test, or built-in self-test (BIST) circuit, is implemented into [a] field programmable gate array (FPGA) that is placed off the chip on the test fixture, hence BOST”[1].

Of course, the concept of adding active circuitry to a test fixture had been used for many years prior to that announcement, to permit low performance ATE to test high performance ICs, so BOST only changed that paradigm by being described as a “BIST” solution that was built “off” instead of “in” the device-under-test (DUT). Typically, an FPGA on a DUT board can generate a digital stimulus faster than a low cost ATE can, or analyze digital responses faster. An FPGA on a loadboard or within a test-head permits tremendous flexibility because its circuitry can be configured from an on-board flash memory or from the tester’s general memory prior to each production run (FPGA configuration typically requires 0.25~2 seconds and millions of programming bits, making it impractical to perform prior to testing each device).

Traditionally, FPGA performance has lagged that of ASICs because FPGAs must permit re-configuration of their logic. In the last two years, however, FPGAs have become available with >6 gigabit per second (Gbps) serializer/ deserializer (SerDes) functions, which is faster than mainstream IC performance of 0.5~3.2 Gbps. With these new FPGAs, serial data rates above 6 Gbps can be generated and captured [2][3], making this look like a much lower cost alternative to purchasing ATE intended for these data rates. However, there are important limitations to using SerDes FPGAs to test high-speed ICs.

One important problem for any active circuitry on a loadboard is its susceptibility to mishandling and electro-static discharge (ESD) during the constant carrying, installation, and removal procedures that are common on a production test floor. The mishandling may damage the circuitry without being clearly evident and may cause an expensive decrease in yield or quality before the problem location is identified. ATE channels are not handled loosely, and they are periodically self-tested and calibrated by the ATE itself.

A limitation of using an FPGA as the active circuitry is its limited ability to adjust its input and output timing (even though some have DLLs for some delay paths): typical logic block delays in the latest FPGAs are 30~50 ps so this becomes the minimum controllable time increment for a delay path. Some researchers have used Vernier ring oscillators whose oscillation frequencies are adjusted by adding/removing gates from the ring, but 55 ps is the finest resolution reported [4]. At multi-Gbps rates, <1 ps resolution is needed. One solution is to use a mixed-signal IC that incorporates custom, differential layout to permit finer delay resolution. Besides requiring a custom IC, the measurement circuitry becomes more distant from the embedded circuit block under test, resulting in an access path that adds jitter or makes the design of the DUT and the interface board so complex and engineering-intensive that time-to-market and development cost are increased intolerably.

This paper presents an FPGA-based BOST approach that can test its own performance, including microampere pin leakage and sub-picosecond jitter, in less than 20 ms just prior to testing each DUT to detect any pin damage caused by temporary short circuits, over-voltage, or ESD. Using recently available FPGAs having up to twenty 6.4 Gbps SerDes channels and hundreds of differential I/O pins, ICs with a wide range of frequencies and performance can be tested with any ATE, or a laptop PC connected to only the JTAG pins of the FPGA. We also show how a few dozen extra logic gates on the DUT can provide nearly jitter-free access to embedded circuit blocks.

Section 2 of this paper summarizes three published digital BIST techniques that test IC pin leakages, high frequency (HF) jitter and jitter tolerance in SerDes transceivers, and low frequency (LF) and HF jitter in embedded PLLs. Section 3 shows how these circuit techniques have been implemented in an FPGA, and why we chose a particular FPGA. Section 4 describes a typical BOST implementation, including circuitry on the DUT board, and capabilities of this implementation. Section 5 provides some test results and discusses their significance.

Page 2: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 2

2. BIST techniques 2.1 Pin leakage current

A technique for BIST of sub-microamp leakage currents in IC pins was described in [5]. Uniquely, the technique does not require any changes to the standard boundary scan circuitry that is connected to digital pins in most new ASICs, however, the pin must be bi-directional (even if the function only requires it to be unidirectional), as shown in Fig. 1, and the TAP controller needs minor logic changes.

To measure leakage current IIH for an input, the pin is first driven to logic 1, and then the pin is tri-stated, controlled via the boundary scan logic. For a non-zero IIH, the pad’s voltage will decay towards logic 0. Next, the pin’s logic value is captured a precise time after tri-stating, for example, 5 microseconds. If the logic value is still logic 1, then based on the pad’s typical capacitance (3~6 pF) and input switching point voltage (0.5~1.5 V), the leakage current (0.5~10 µA) can be assumed less than the value calculated as shown in Fig. 2.

Fig. 1 Pad with boundary scan and I/O wrap

ILEAKAGE ≈ CPAD (VDD – VSW) / t Fig. 2 Waveforms during leakage test

The technique requires additional tests, described in [5],

to ensure that the pad is not simply stuck at logic 1 due to a faulty Enable. The tests are repeated for IIL (pins initially driven to logic 0), and for odd and even pins separately to ensure that pad-to-pad leakage paths are also detected. For 1149.1 TAP clock rates of 10 MHz, total test time for up to 500 pins is less than 5 milliseconds because many pins are tested in parallel. Accuracy depends on the actual pad capacitance and input switching point voltage compared to their nominal values, and is typically less than 30% of the measured value, even for values less than 100 nA. Importantly, the measured value can be stored for each pin of a specific FPGA so that variations may be detected.

2.2 Undersampling-based tests of SerDes waveform, jitter, and jitter tolerance An unlimited time resolution analysis (ULTRA)

technique for measuring jitter and phase-delays was described in [6]. It uses an undersampling reference clock whose frequency is slightly offset from a primary reference frequency, and the signal frequency of interest is synchronous to the reference frequency. Both the sampling and reference frequencies are generated by stable sources whose frequencies can be set with very fine resolution. Low cost, voltage-controlled crystal oscillators (VCXOs) typically have <1 ps RMS phase jitter and a frequency offset of up to ±50 ppm (±0.005%) but this is a very narrow frequency range. Recently available PLLs and VCOs (eg. National Semiconductor LMK0300x series, and Silicon Labs Si55x series) for telecom applications provide similar jitter and frequency resolution but with wide and programmable frequency ranges.

The effective sampling resolution of an ULTRA circuit is equal to the difference between its two fundamental clock periods. For example, if the reference frequency is 100.0 MHz and the undersampling clock is 100.1 MHz (equals 0.1% offset), the sampling resolution is 1/100.0MHz – 1/100.1MHz = 10 ps. If the frequency of interest is an integer multiple of 100 MHz, for example 1 GHz, and the undersampling clock remains 100.1 MHz, the sampling resolution is still 10 ps.

A classic problem with undersampling is the amount of low frequency (LF) timing jitter in many clock sources, especially single-ended clocks. This timing jitter gets aliased by the undersampling and can range up to 40 ps RMS if the clocks are not coherent. By filtering out the jitter below, say 100 kHz, the clock-induced timing jitter can decrease to less than 0.5 ps RMS. The ULTRA circuit accomplishes this by finding the median edge in each transition region of samples and then capturing each sample’s value and position relative to each median, shown by arrows in Fig. 3. A cumulative distribution function (CDF) of the jitter is accumulated and can be shifted off-chip for conversion into a probability density function (PDF, or histogram) for further analysis, such as calculation of RMS value. For a Gaussian PDF, the RMS value can be computed efficiently on-chip by counting the time between selected percentage points of the CDF: 16% and 84% correspond to twice the RMS value.

The resulting analysis is insensitive to LF movement in the median, and the frequency offset controls the LF cutoff frequency. We have found that measurements at one offset frequency correlate well to measurements at other offsets differing by >25%.

Fig. 3 Detection of median edge in “jitter bits”

VEN

VPAD

VIN

tVSW

VDD

VSS

SOin out

SI

SOin out

SI

SOfromCore toPad

toCore fromPadSI

SOfromCore toPad

toCore fromPadSI

ClockDR

UpdateDR

ShiftDR

Mode

from Core

to Core

from Core

Pad

Scan data

En

In

Jitter bit-filtered samples

transition region transition region transition region

median median medianSamples1/fSAMPLING

Page 3: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 3

Jitter components, such as duty cycle distortion and transition-density dependent phase delay (TDDD, which causes inter-symbol interference, ISI) are measured directly, using one of the reference clocks and the offset frequency beat (produced by sampling one reference clock with the other) as time-bases.

Specifically, duty cycle in the undersampled waveform is measured by counting logic 1 samples, in a programmable number of reference clock cycles, being careful to include an integer number of periods of the aliased waveform.

Phase delays, for clock-like data patterns with different transition densities, are measured relative to the offset frequency beat.

As described in [7], waveform characteristics such as slew rate and received amplitude are tested by injecting a DC offset into the AC-coupled differential signal as shown in Fig. 4 and measuring the increase or decrease of the received signal’s duty cycle while a clock-like bit pattern is transmitted. For current-mode logic (CML) transmitters, the transmitted signal amplitude can also be deduced, by measuring the average (i.e., DC) voltage at the transmitter’s pins via high impedance resistors, for two different duty cycle bit-patterns.

As described in [6], a structural jitter tolerance parameter that can be measured is the sampling instant within the received signal eye: its mean value and variations, both systematic and random. These allow the timing margin for additional jitter to be calculated. Another important aspect of jitter tolerance is the ability of the receiver’s equalization circuitry to compensate for TDDD. This is measured as described earlier for jitter.

Fig. 4 Connections for testing a SerDes transceiver, using

undersampling 2.3 PLLs, DLLs, clocks, and logic delays

The PLLBOST technique in [1] did not fully address the important problem of testing embedded PLLs because it required the signals-of-interest to be DUT pin signals. An approach was recently reported in [8] to measure the timing characteristics of non-SerDes signals within a DUT, especially for high-frequency PLLs that are the heart of many digital-only ICs.

The approach requires digital undersampling within the DUT using latches or flip-flops, which are differential if necessary. After a signal is sampled by an offset-frequency clock, the digital samples are shifted serially to an on-chip ULTRA circuit, as shown in Fig. 5. The input and output signals of multiple PLL, DLLs, and clocks can be undersampled using the cascadable circuit shown in Fig. 6.

Fig. 5 Connections to clocks and PLL

D QD QSelect

PLL

Per PLL samplerD QD Q D QD Q

From previous PLL sampler

Referenceclock

Under-sampling clock

From previous PLL sampler

PLLunder test

To ULTRA or next PLL

D QD Q

PLL output clock

Fig. 6 Cascadable sampler for each PLL

The digital samples could be shifted off-chip using a pair of function pins multiplexed for PLL testing. Capturing and shifting out pairs of samples permits simultaneous measurement of PLL input and output jitter (though test time is typically <10 ms for each jitter), and simplifies measurement of PLL phase error or the delay of combinational logic paths (whose stimulus is a clock-like pattern). It is also possible to measure the jitter or phase of one signal at a time, relative to the offset frequency beat, in which case a single serial output is sufficient, for example the TDO pin of the TAP.

In [9], an FPGA-based BOST implementation was described that could test SerDes ICs. The relatively low-speed FPGA was connected to only the parallel ports of the DUT and to the parallel ports of a complementary ‘golden’ IC – for a serializer DUT the golden IC was a deserializer. That approach is general, but not as simple, flexible, and general as using an FPGA with embedded SerDes, as will be shown in the next sections.

3. BOST FPGA design

The BIST described in [6] was primarily aimed at enabling any ATE to test any SerDes IC, without off-chip access to the parallel ports. The primary objective for the BOST is to permit any ATE plus an FPGA to test SerDes ICs with little or no on-chip design-for-test circuitry, providing they have off-chip access to the parallel ports. The other major objective is to permit testing of ICs that have a mismatch in number of high-speed serial transmitters and receivers, including ICs with only transmitters or only receivers.

RxTx Des

.

Ser.

Stim

ulus

fSfD

NfSRXPLL

fS

DC PMU

DC PMUAnalysis

CDR

D Q

5kΩ10kΩ

100Ω

PLLunder test

D QD Q

Referenceclock

Under-samplingclock

Phasedetector ÷K D QD Q

Shared or dedicated1149.1 (JTAG) TAP pins

ICboard

÷(K+1)

VCXO

VCXO

ULTRAULTRA++

TAP controlTAP control

Page 4: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 4

A block diagram of the FPGA schematic is shown in Fig. 7. The multiplexers shown in dashed outlines simply show options: to improve clock and data delays, the multiplexers are ‘permanently’ configured to one path or the other.

Fig. 7 FPGA block diagram (Parallel port width is W bits)

The FPGA is entirely controlled via an 1149.1 TAP so

that the automatic test generation software used for the BIST in [6] can be used, and so that characterization can be performed on a bench using a PC instead of ATE. To support multi-site testing, there can be one TAP output corresponding to each site (a site can have up to 8 transmitters or receivers). A variety of modules for USB to JTAG (or general-purpose I/O) are widely available; we assigned four pins of a general-purpose I/O module as TAP pins to allow our PC to directly control the FPGA.

The reference clocks are the most important signals. Most ATE, and probably all “low cost” ATE, are unable to generate clocks with sub-picosecond RMS jitter and separated in frequency by 100~200 ppm. Therefore, although the primary reference clock could be generated by the ATE, it is best generated by a dedicated crystal oscillator, preferably differential, and the offset reference clock is best generated using a high quality VCO or PLL, as mentioned earlier. The FPGA includes the phase detector shown in Fig. 5, and its tri-state output is connected to the VCXO’s control voltage. This allows arbitrary frequency offsets that are precise (±1 Hz), and they can be changed in the middle of a test to accelerate delay tests (test time is inversely proportional to the frequency offset) or to change the cut-off frequency for measuring HF and LF jitter. Settling time for a change in control voltage for a typical VCXO is less than a millisecond.

The two reference clock sources may be connected solely to the FPGA which uses them internally and outputs them to the DUT, or the clocks can be connected solely to the DUT, in which case the DUT must provide output clocks to the FPGA as they would for any application logic connected to the DUT’s parallel ports. The connections can be differential (preferably) or single-ended.

We found it useful to provide several monitor outputs for observation with a low bandwidth (e.g., 100 MHz) oscilloscope. One output is bit 0 of each receiver’s parallel port – this is the data that is analyzed for jitter, duty cycle, and phase shift. Another output is the reference beat frequency, after filtering by a median edge detector to remove “jitter bits” so that the waveform is also suitable for triggering the oscilloscope. A third output indicates when the BERT detects a bit error in a word – normally there will be zero errors, but when there are many, viewing the signal helps in diagnosing whether the errors are systematic or random.

It is important to note that only one bit (bit 0) of the receiver’s parallel output port is used for jitter analysis. If more output bits are used, this might permit faster analysis or finer resolution but then any variation in the FPGA SerDes sampling point for different bits would be added to the measured value. Our measurements show that there is always systematic sampling jitter and it typically varies from 10 to 30 ps p-p, varying with test conditions and from receiver to receiver. By using only one output bit line and using the receiver in its lock-to-reference mode, we are using the receiver’s clock-data recovery (CDR) in a mode that has less jitter than its normal mission mode.

In addition to the parametric test capabilities described in the previous section, the FPGA design includes standard pseudo-random bit stream (PRBS) generators and bit error rate testers (BERT). These are suitable for detecting faults in the DUT’s parallel circuitry (e.g., 8B10B coder or FIFO register) if they are not tested by scan test or logic BIST, and for performing a conventional bit error rate test for many minutes in characterization to verify correlation with much faster tests for jitter.

The number of pins for SerDes parallel ports is typically 8 or 10 bits for <3 Gbps, 16 or 20 bits for 2~5 Gbps, 32 or 40 bits for 4~10 Gbps, and 64 or 80 bits for >9 Gbps. These values keep the parallel rates below the 300 MHz maximum rate that can be handled by single-ended I/O pins and minimize pin-count. The smallest SerDes FPGAs have over 350 signal pins, which permits full parallel and serial access to a DUT with 8 transceivers having 8- or 10-bit ports, 4 transceivers that have 16- or 20-bit ports, or 2 transceivers with 32- or 40-bit ports.

3.1 Choice of FPGA

In 2004, we knew of three vendors of FPGAs with embedded SerDes (Altera, Lattice, Xilinx). Initially, our FPGA requirements were: the SerDes receivers must have a lock-to-reference mode of operation (for undersampling), and DC coupling to the pins (for DC offset injection), and the FPGA must also contain widely programmable PLLs

W

TxRefClk RxRefClk

VCXOvoutsVCXOin

W

TxClkOut RxClkOut

÷÷K, K+1K, K+1

XOin

1149.1 TAP to/from ATE Monitors to ‘scope

FromRX

FromTX

To RX

To TX

ULTRAULTRAPat. Gen.Pat. Gen.

BERTBERT

ULTRAULTRAPat. Gen.Pat. Gen.

BERTBERT

Deserializer1Deserializer1

Serializer1Serializer1

DeserializerNDeserializerN

SerializerNSerializerN

Page 5: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 5

for offset-frequency clock generation and to serve as a circuit-under-test; those criteria limited us to Altera Stratix GX™ devices with 3.2 Gbps SerDes I/Os. While designing with these devices, we discovered that the transmitter and receiver of each transceiver can be clocked independently, and that serial data can be looped back internally prior to input/output buffering (bypassing pre-emphasis and equalization).

The following year, to permit much higher serial data rates, we tried Xilinx Virtex II Pro X™ devices which have multiple 10 Gbps SerDes I/Os. The receiver inputs were AC-coupled, but by then we had demonstrated a (less accurate) technique for measuring edge slew rates without injecting an offset voltage – we measured actual vs. expected duty cycle for clock-like waveforms with different bit-wise duty cycles. While designing with these devices, we realized that each transceiver has only one reference clock input, which meant that testing the FPGA’s transceiver required loopback to another transceiver (with an offset frequency reference clock). This was not an immediate problem, but means that an on-chip serial loopback path cannot be used for self-test of the transceiver independent of off-chip loopback path quality (on the other hand, the devices uniquely offer on-chip pad-to-pad loopback).

In 2006, Altera announced its Stratix II GX™ FPGAs with 6.4 Gbps serial data rate capability. The SerDes functions in these devices proved to be similar to the Stratix GX, with respect to independent reference clock capability, DC coupling, and on-chip loopback. This FPGA family also offers an intriguing capability, from a business and IP-protection perspective: the devices have poly-silicon fuses in which a 128-bit encryption key can be programmed (once). This permits the circuit content of each device to be configured only with an appropriately encrypted bit pattern.

Therefore, we could supply users with a specific number of key-programmed FPGAs for which we could email different, encrypted configuration files (for storing in a flash memory) – convenient for users as well as us. This presents a new, secure delivery mechanism for test capability, though it is not possible presently with other SerDes FPGAs.

4. BOST implementations

A variety of FPGA-to-DUT connections are possible with the FPGA design described. Fig. 8 shows two examples that illustrate the wide range of performance that can be tested, using any tester plus the FPGA.

Fig. 8a shows connections for ICs with serial data rates less than the FPGA’s specified maximum rate (presently 6.4 Gbps); the FPGA can be connected to both the serial and parallel I/Os for a transmit-only device or a receive-only device (both are shown in the figure). The arrangement is suitable for transceivers too, permitting the DUT to have a single reference clock and the FPGA transceivers to have a single, offset-frequency reference clock. The schematic also shows a two-wire connection for a serial output for samples from the cascadable circuit of Fig. 6. As described earlier, the signal samples come from serially accessed/connected latches that undersample signals from clock inputs, PLLs, DLLs, or logic delay paths.

Fig. 8b shows connections for transceiver ICs having serial data rates too high for the FPGA’s specified maximum SerDes data rate. The FPGA is connected to only the parallel I/Os of the transceivers. Each transmitter’s serial output is looped-back to a receiver’s serial input. For this case, the receiver must be able to use a second reference clock with an offset frequency.

For diagram simplicity, the AC-coupling capacitors and voltage injection access resistors are not shown (see Fig. 4)

Fig. 8a FPGA connections for <6 Gbps SerDes Fig. 8b Connections for >6 Gbps

W

TxRefClk RxRefClk

W

÷÷N, N+1N, N+1

ULTRAULTRAPat. Gen.Pat. Gen.

BERTBERT

ULTRAULTRAPat. Gen.Pat. Gen.

BERTBERT

Deserializer1Deserializer1

Serializer1Serializer1

DeserializerNDeserializerN

SerializerNSerializerN

SerializerSerializer

DeserializerDeserializer

W

W

RefClk

Data

RefClk

RecovClk

LockToRef

Data

To/from ATE

To/from ATE

Devices under test

PLL/clock/delay latched samples

VCXO

Tx

SerializerSerializer

DeserializerDeserializer

W

W

RefClk

Data

RefClk

RecovClk

LockToRef

Data

DUT

Page 6: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 6

The interface board contains the following, beyond standard decoupling capacitors and connections to the ATE: • FPGA (780-pin BGA, 29×29 mm2) • 64 Mb serial flash memory (16-pin, 10×10 mm2), for FPGA

configuration • Two crystal oscillators (5×7 mm2) or PLLs (7×7 mm2) • DC-offset injection and measurement-access resistors (1~5 kΩ,

0.1%) surface-mounted on serial differential pairs • AC-coupling capacitors for serial data paths (if required by

DUT) With the above-listed circuitry, a low cost ATE can

test all the parameters that the BIST implementation can, including the transmitted waveform’s amplitude, slew rate, overshoot, jitter (RJRMS, TJRMS, DCDP-P, ISIP-P), and jitter tolerance (mean sampling instant, recovered clock’s RJRMS and systematic sampling jitter). From these, DJP-P and TJP-P can be estimated.

The accuracy of the test for random jitter in the recovered clock will be reduced compared to BIST because the parallel-rate recovered clock jitter is measured off-chip where it is affected by simultaneous switching outputs. Measurement accuracy for the mean sampling instant will be greatly reduced due to LF delay variation in the longer clock paths and due to our inability to reduce two specific wire delay mismatches to <50 ps in an FPGA.

PLL signals that are normally accessible at DUT pins can be undersampled by the FPGA directly. If the design-for-test (DFT) circuitry of Fig. 6 is added to the DUT design, then the jitter, phase error, and output duty cycle and frequency can also be measured for embedded PLLs and DLLs, with the same picosecond resolution of the BIST implementation.

For detection of any changes to the active loadboard circuitry (caused by mishandling or faulty DUTs), there are a variety of self-tests. The pin leakage for all of the FPGA pins, except the high-speed serial pins, can be measured, as described previously. Via the serial path access resistors, the reference voltage of the FPGA’s (and DUT’s) serial transmitters and receivers can be measured by a DC parametric measurement unit (PMU) of the ATE. Using the FPGA’s internal high-speed serial loopback capability, the jitter and structural jitter tolerance parameters can be tested. The lack of pre-emphasis and equalization in on-chip loopback will result in very different measured values than for off-chip loopback, but because the measured jitter and phase shifts are so much smaller, we expect the test to be sensitive to any damage in the region of the SerDes, including the FPGA’s power rail quality and substrate coupling.

For additional confidence in the FPGA’s SerDes quality, loopback through a special DUT circuit board that contains only loopback wires can be created for periodic calibration, as is normal for other sensitive ATE tests. In any case, the FPGA’s performance (e.g., jitter) is not characterized but simply tested to verify it is within acceptable limits appropriate for the IC being tested: if the DUT operates at 3 Gbps, then higher limits for the FPGA’s

jitter can be set compared to when the FPGA is testing at 6 Gbps. This will likely limit this particular FPGA to testing at rates below 6.4 Gbps; it seems reasonable to expect 5 Gbps test capability from this FPGA series. 5. Hardware results and discussion

To demonstrate the BOST capabilities described in this paper, we used an off-the-shelf development board already containing the FPGA. For all tests, the parallel clock rate (and nominal crystal frequency) was 156.25 MHz, the parallel port width (W) was 40 bits, and the serial data rate was 6.25 Gbps (the unit interval, UI = 160 ps).

Three different loopback paths were available within the board. The shortest path on the board is approximately 2 inches of FR4 wire trace for the transmitter and same for the receiver, plus 12 inches of coaxial wire and two SMA connectors. For evaluation of long wire paths, one of the six connected receivers has an additional 8 inches of FR4 trace, and one of the transmitters has an additional 38 inches of FR4 trace.

The board contains a 156.25 MHz single-ended output crystal with a single-ended to differential buffer, and we constructed a small daughterboard for the differential VCXO whose control voltage input was connected to a phase frequency detector output pin of the FPGA (we later concluded that an op-amp buffer is needed to tolerate the <50 kΩ input impedance of the VCXO). We chose, somewhat arbitrarily, a PLL feedback divider with K=6400 to obtain a timing resolution of exactly 1 picosecond and a frequency offset of 24.415 kHz. An equation from [6] can be used to calculate the theoretical cutoff between LF and HF jitter:

AJ2πfJ = |WfOFFSET| ∴ fJ = |WfOFFSET| / (2πAJ) , where

AJ is the amplitude of the jitter, in units of UI, fJ is the frequency of the jitter, in Hz, W is the parallel port width in bits, and fOFFSET is the reference frequency offset.

For our case, and assuming the standard jitter tolerance mask amplitude of 0.05 UI at the jitter corner frequency, fJ equals 3.1 MHz, which is close to the standards-specified golden PLL cutoff of 3.7 MHz derived from the data rate divided by 1667.

Instead of connecting a DC PMU to the 5kΩ DC-offset injection resistors, we simply used pairs of digital output pins of the FPGA and injected VDD (3.3 V) or VSS. This generates ±34 mV across the 100 Ω input resistance of the receivers, sufficient to change the 160 ps pulse width of a 1010 pattern by about ±10 ps, yielding a calculated slew rate of 3 V/ns. The measured value depends on the transmitter’s amplitude and pre-emphasis settings, of course, and the receiver’s equalization setting.

Pre-emphasis and equalization greatly affect TDDD, which is the dominant source of ISI in a band-limited signal path (in non-band-limited paths, reflections due to impedance mismatch are the dominant source). Fig. 9 illustrates the impact on phase delay that we measured for different equalization settings while applying the range of

Page 7: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 7

possible transition densities in the form of clock-like patterns, for 6.25 Gbps on a loopback path with 4 inches of FR4 wire trace. To perform these measurements, we first applied a reference pattern to the transmitter’s 40-bit input port comprising 20 consecutive zeroes followed by 20 consecutive ones, and measured the received data’s phase delay relative to a beat period, while the receiver was in its lock-to-reference mode – this was the lowest transition density (or frequency) and we declared it as the reference delay. Then we applied a pattern comprising alternating groups of 5 zeroes and 5 ones, and measured its phase delay relative to the reference delay. Then we applied a pattern comprising alternating 00 and 11, and lastly we applied an alternating zero one pattern (the highest possible transition density).

In band-limited paths, without equalization, 0101 patterns are phase advanced relative to 00110011 patterns – adding equalization delays the 0101 transition times relative to low frequencies. For longer loopback paths, the curves in Fig 9 shift down and spread out. Each data point required about 40 ms test time to achieve about 5 ps repeatability: in a production test only one point per curve is needed, only four of the binary-controlled settings need be tested, and the test times could be reduced to 10 ms by degrading the repeatability to 10 ps.

Fig. 9 TDDD for 16 different equalization settings Horizontal log2 scale, based on effective frequency of the clock-like pattern relative to HalfWord (see text for explanation); HalfWord frequency is 156.25 MHz; 0101 frequency is 3.125 GHz, which is 20 times higher than 156.25 MHz, and log2(20) = 4.3.

We also compared the measured TDDD to the ISI measured for an 8B10B-compliant pseudo-random pattern by an Agilent 86100C oscilloscope equipped with DCA-J jitter measurement. Fig. 10 shows the correlation plot for loopback path lengths ranging from 4 to 40 inches. With the FPGA’s equalization enabled, the correlation R2 is better than 0.98 with or without pre-emphasis. The best-fit straight line (shown) has a unity slope and -10 ps offset (due to equalization in the receiver but not in the ‘scope). With equalization disabled, there was significant discrepancy, though the best fit line (not shown) has no offset. This test verifies an important component of jitter tolerance.

Another component of jitter tolerance is jitter in the recovered clock. The receiver’s parallel-port clock jitter was measured for its HF jitter and LF jitter, with very different results. The HF jitter was 0~0.5 ps RMS, clearly below the 1.00 ps resolution that we chose (increasing the resolution is possible, but would include more LF jitter). We did not have equipment to measure this level of jitter off-chip. The LF jitter was >10 ps RMS, and was not Gaussian, indicating a lot of LF modulation (more than we measured in the non-SerDes PLL), however, SerDes BER performance is independent of jitter that is below the golden PLL cutoff frequency. This test verified that the CDR tracked the LF jitter.

Fig. 10 BOST-measured TDDD vs. oscilloscope-measured ISI

For RJRMS jitter measurements, with 5 ms test times,

the mean value was 1 ps rms and the repeatability sigma was approximately 25 fs, corresponding to ±3 sigma of 0.15 ps. For TJRMS jitter measurements, with 20 ms test times, the mean value was 5 ps rms and the repeatability sigma was approximately 50 fs, corresponding to ±3 sigma of 0.3 ps. These values were about half the values measured for the 3.2 Gbps Stratix GX (except mean TJRMS values were similar).

Correlations to other equipment were performed for the 3.2 Gbps Stratix GX measurements, and reported in [8]. In summary, the correlation of FPGA-measured RMS HF

ISI (p-p)

0

20

40

60

80

100

120

140

160

180

200

0 20 40 60 80 100 120 140 160 180 200Agilent 86100C DCA-J Oscilloscope (ps)

BIS

T/B

OS

T (p

s)

No pre-emphasis, Eq[1:0]=[10]Pre[1:0]=[11], Eq[1:0]=[10]Pre[1:0]=[11], No equalizationIdeal

00

0 0

1 1

1

12 2

2

2

33

3

3

44

4

4

5 5 5

5

6 66

6

7 7

7

7

8 8

8

8

9 9 9

9

10 1010

10

11 11

11

11

12 12

12

12

1313

13

13

1414

14

14

1515 15

15

-10

0

10

20

30

40

500 1 2 3 4 5

HalfWord 0000011111 0011 0101

(ps)

Page 8: [IEEE 2007 IEEE International Test Conference - Santa Clara, CA, USA (2007.10.21-2007.10.26)] 2007 IEEE International Test Conference - A selt-testing BOST for high-frequency PLLs,

Paper 4.3 INTERNATIONAL TEST CONFERENCE 8

jitter for a 1010 pattern (≈ RJRMS) to Wavecrest SIA 3000 measurements for RJ had a residual error (R2) value of 0.9994 for RJ values of 1~7 ps RMS. The correlation of BOST-measured RMS HF jitter for a pseudo-random pattern (≈ TJRMS) to Agilent 86100C DCA-J for ISI measurements had an R2 value of 0.9996 for ISI of 35~190 ps p-p (RJ dominated below 35 ps), which was better than the values shown in Fig. 10 for the same conditions. We have not yet re-done the tests for the 6.2 Gbps Stratix II GX measurements, but we have no reason to expect worse accuracy.

We measured embedded PLL jitter for different PLL loop filter bandwidths (as reported for a BIST in [8]), and recorded values ranging from 3~11 ps RMS for LF jitter, and 0.1~1 ps for HF jitter, as shown in Fig. 11 in comparison with Wavecrest measurements. The low value of the LF jitter measurement was a surprise – we did not anticipate that when the frequency offset is tightly controlled by a PLL (the PLL containing the VCXO, not the PLL under test), the effective LF jitter of the reference clocks is cancelled because only the relative jitter (of one reference clock compared to the other) is important.

0

2

4

6

8

10

12

14

0.1 1 10

PLL Loop Bandwith (MHz)

Jitte

r (ps

RM

S)

WC-Relative jitterULTRA-LF jitterWC-Period jitterWC-Reference clock jitterULTRA-Relative jitterULTRA-HF jitter

Fig. 11 Measured PLL jitter for different PLL loop

bandwidths, compared to measurements using a Wavecrest DTS-2070C (“WC”)

The bandwidth and timing resolution of this test

approach are encouraging. The bandwidth of this FPGA’s SerDes receiver including equalization is likely greater than 5 GHz, based on the <90 ps transition times we observed in the receiver when using the minimum length of bandwidth-limiting FR4 wire trace to the DUT. The timing resolution is primarily limited by only reference clock frequency offset and jitter, and sampling jitter. Recently available discrete VCOs and PLLs (examples were noted earlier) seem to indicate that jitter specifications are continually decreasing: <1 ps RMS phase jitter is almost commonplace. And the frequency offset for a VCXO within a PLL can be kept within 2 Hz of the target value (24.415 kHz, in our case), corresponding to <0.01% variation of a 1.00 ps sampling resolution. Circuit jitter has

thus far (several different ICs) proven to be much less than 1 ps RMS, though design dependent.

Time must be measured relative to a known time delay or a known frequency, and it seems that delay-line based (including oscillators containing delay lines) time measurement circuits in CMOS intrinsically have more self-jitter than an LC-oscillator or crystal-referenced, frequency-based measurement.

Our future plans include using the ULTRA principle to measure other I/O delays, especially double-data rate (DDR) ports for memories, since these delays are becoming more important and too fast for low-cost ATE. Pin delays will need to be tested in parallel to achieve acceptable test times, and an FPGA is well-suited to implement parallel processing. 6. Conclusions

BIST techniques for measuring SerDes, PLL, clock, and pin quality were exploited in an FPGA-based BOST implementation to enable any low cost ATE to test SerDes ICs that do not have parametric BIST.

The tests for multi-Gbps data measure the waveform, jitter, and jitter tolerance with picosecond resolution. Adding a few dozen logic gates to the DUT to serially shift out data from undersampling latches, permits embedded PLLs, DLLS, and logic delays to be tested with similar resolution, overcoming the access problem of PLLBOST.

The purely digital BIST for pin leakage testing ensures that the FPGA has not been damaged by mishandling or a faulty DUT, and the quick test time means it can be run for every DUT.

We believe that this paper presents the first FPGA-based BOST that achieves 1 ps measurement resolution (and comparable accuracy), that can test up to 6.4 Gbps directly or much higher indirectly, or that can test its own pin leakage to check for ESD damage. References 1. J.Dorsch, “The Softer Side of Test: Software products to star

at International Test Conference”, Electronic News, Sept. 1999

2. Altera Stratix II GX product datasheet 3. Xilinx Virtex II Pro X product datasheet 4. A.Chan, G.Roberts, “A Synthesizable, Fast and High-

Resolution Timing Measurement Device Using a Component-Invariant Vernier Delay Line,: Proc. of ITC’01, pp. 858-867

5. S.Sunter et al, “Contactless Digital Testing of IC Pin Leakage Currents”, Proc. of ITC’01, pp.204-210

6. S.Sunter, A.Roy, “Structural Tests for Jitter Tolerance in SerDes Receivers”, Proc. of ITC’05

7. S. Sunter et al, “An Automated, Complete, Structural Test Solution for SERDES”, Proc. of ITC’04, pp. 95-104

8. S.Sunter, A.Roy, “Purely digital BIST for any PLL or DLL”, Proc. of European Test Symposium, May 2007.

9. S.Sunter et al, “A digital BIST/BOST for parametric testing of 0.5~10 Gbps serial interface”, Semicon Japan, Dec. 2006