PMC_6G

4
A 4.9-Gb/s to 6.4-Gb/s Transceiver with Adaptive Equalizer for Backplane Communications Chris Siu, Jurgen Hissen, Guillaume Fortin, Jatinder Chana, Bernard Guay, Graeme Boyd, Tony Zortea, John Plasterer, Matthew McAdam, Charles Roy, Mike Venditti, Geoff Allbutt, Gershom Birk, Kevin Betts, and Hormoz Djahanshahi PMC-Sierra, 8555 Baxter Place, Burnaby, BC, V5A 4V7, Canada Abstract- A 4.9-Gb/s to 6.4-Gb/s transceiver using NRZ coding is presented in this paper. To achieve low bit error rates (BER) over legacy FR4 backplanes, adaptive equalization with a multi- tap Decision Feedback Equalizer (DFE) is used. This CEI-6-LR transceiver has been implemented with production quality in a standard 0.13μm CMOS process and consumes 285 mW per link. BER of less than 10 -15 has been demonstrated over a 34- inch (86-cm) FR4 backplane. I. INTRODUCTION High-performance telecommunication, computing and storage systems require ever-increasing data throughput and reliability. In addition, there is a need for platforms to offer performance upgrades while preserving existing backplane designs. This results in the need for advanced serial transceiver technology with higher data rates and low Bit Error Rate (BER) performance [1]-[7]. This same trend has resulted in the creation by Optical Internetworking Forum (OIF) of CEI-6 and CEI-11 standards [8]. These standards offer system architects the desired improvement in data throughput while ensuring the interoperability of transceivers from multiple vendors. Such standardization efforts must also consider the implementation cost of the resulting technology. For serial links in the multi-gigabit per second regime, signal integrity imposes fundamental performance limits. The severity of frequency dependent attenuation, signal reflections, and crosstalk noise dictates the achievable BER performance for a given architecture. As these effects begin to dominate, improvements can only be achieved by increasing the complexity of the transceiver. It is therefore crucial to achieve the best architectural cost vs. performance trade-offs while achieving the desired BER performance. This paper is organized as follows. First, an overview of the transceiver architecture is provided. Next, we discuss the major sub-blocks, namely clock synthesis PLL, transmitter and receiver. Finally, measurement results, including BER performance over different backplanes, are provided. II. ARCHITECTURE OVERVIEW In recent years, there has been much interest in the use of alternate coding schemes to operate at high data rates over legacy backplanes. For example, the use of PAM-4 [9] can cut the required bandwidth in half, but with a tradeoff on Signal-to-Noise Ratio (SNR). The choice of whether NRZ or PAM-4 is more advantageous would depend on the bit rate and the backplane’s frequency response [10]. We have studied the characteristics of a number of backplanes, and concluded that for legacy backplanes, designed for 3.125- Gb/s operation, NRZ coding is more advantageous than PAM-4 for transmission in the 6-Gb/s range. The theoretical basis for this conclusion is a BER estimation methodology, which incorporates the characteristics of the main channels and crosstalk aggressors, as well as jitter sources. A similar methodology is implemented in StatEye (www.stateye.org ) developed within OIF. Given the extremely low BER demanded by equipment vendors, it is not practical to evaluate an architecture by using transient simulation. Rather, a statistical technique such as StatEye can efficiently provide insight into the factors dominating BER performance and serve as a guide to architectural decisions. A block diagram of the resulting 6.25-Gb/s transceiver is shown in Fig. 1. The clock synthesis phase-locked loop (PLL) generates a line rate clock in the range of 4.9 GHz to 6.4 GHz. The PLL output is fed to the Multi-Clock Generator unit, which creates a number of lower rate clocks for other blocks in the transceiver. The parallel-to-serial converter (PISO) converts an N-bit word from the digital core into a line rate serial stream. A transmitter (TX) with a single pre-emphasis tap is used in the architecture to drive the backplane. While other published architectures have used extensive linear equalization techniques [1]-[5] (whether implemented in the RX as filtering or in the TX as pre- emphasis), this development avoids putting too much weight on linear filtering techniques. Linear filtering amplifies crosstalk. Due to the differences in frequency content of the forward and crosstalk channels, overall SNR is degraded by linear filtering such as pre-emphasis. The role of pre- emphasis in this architecture is therefore reduced to the minimum required to overcome other implementation challenges such as the dynamic range of the nonlinear RX

Transcript of PMC_6G

Page 1: PMC_6G

A 4.9-Gb/s to 6.4-Gb/s Transceiver with Adaptive Equalizer for Backplane Communications

Chris Siu, Jurgen Hissen, Guillaume Fortin, Jatinder Chana, Bernard Guay, Graeme Boyd, Tony Zortea, John Plasterer, Matthew McAdam, Charles Roy, Mike Venditti, Geoff Allbutt, Gershom Birk, Kevin Betts, and Hormoz Djahanshahi

PMC-Sierra, 8555 Baxter Place, Burnaby, BC, V5A 4V7, Canada

Abstract- A 4.9-Gb/s to 6.4-Gb/s transceiver using NRZ coding is presented in this paper. To achieve low bit error rates (BER) over legacy FR4 backplanes, adaptive equalization with a multi-tap Decision Feedback Equalizer (DFE) is used. This CEI-6-LR transceiver has been implemented with production quality in a standard 0.13µm CMOS process and consumes 285 mW per link. BER of less than 10-15 has been demonstrated over a 34-inch (86-cm) FR4 backplane.

I. INTRODUCTION

High-performance telecommunication, computing and storage systems require ever-increasing data throughput and reliability. In addition, there is a need for platforms to offer performance upgrades while preserving existing backplane designs. This results in the need for advanced serial transceiver technology with higher data rates and low Bit Error Rate (BER) performance [1]-[7]. This same trend has resulted in the creation by Optical Internetworking Forum (OIF) of CEI-6 and CEI-11 standards [8]. These standards offer system architects the desired improvement in data throughput while ensuring the interoperability of transceivers from multiple vendors. Such standardization efforts must also consider the implementation cost of the resulting technology.

For serial links in the multi-gigabit per second regime, signal integrity imposes fundamental performance limits. The severity of frequency dependent attenuation, signal reflections, and crosstalk noise dictates the achievable BER performance for a given architecture. As these effects begin to dominate, improvements can only be achieved by increasing the complexity of the transceiver. It is therefore crucial to achieve the best architectural cost vs. performance trade-offs while achieving the desired BER performance.

This paper is organized as follows. First, an overview of the transceiver architecture is provided. Next, we discuss the major sub-blocks, namely clock synthesis PLL, transmitter and receiver. Finally, measurement results, including BER performance over different backplanes, are provided.

II. ARCHITECTURE OVERVIEW

In recent years, there has been much interest in the use of alternate coding schemes to operate at high data rates over

legacy backplanes. For example, the use of PAM-4 [9] can cut the required bandwidth in half, but with a tradeoff on Signal-to-Noise Ratio (SNR). The choice of whether NRZ or PAM-4 is more advantageous would depend on the bit rate and the backplane’s frequency response [10]. We have studied the characteristics of a number of backplanes, and concluded that for legacy backplanes, designed for 3.125-Gb/s operation, NRZ coding is more advantageous than PAM-4 for transmission in the 6-Gb/s range.

The theoretical basis for this conclusion is a BER estimation methodology, which incorporates the characteristics of the main channels and crosstalk aggressors, as well as jitter sources. A similar methodology is implemented in StatEye (www.stateye.org) developed within OIF. Given the extremely low BER demanded by equipment vendors, it is not practical to evaluate an architecture by using transient simulation. Rather, a statistical technique such as StatEye can efficiently provide insight into the factors dominating BER performance and serve as a guide to architectural decisions.

A block diagram of the resulting 6.25-Gb/s transceiver is shown in Fig. 1. The clock synthesis phase-locked loop (PLL) generates a line rate clock in the range of 4.9 GHz to 6.4 GHz. The PLL output is fed to the Multi-Clock Generator unit, which creates a number of lower rate clocks for other blocks in the transceiver. The parallel-to-serial converter (PISO) converts an N-bit word from the digital core into a line rate serial stream. A transmitter (TX) with a single pre-emphasis tap is used in the architecture to drive the backplane. While other published architectures have used extensive linear equalization techniques [1]-[5] (whether implemented in the RX as filtering or in the TX as pre-emphasis), this development avoids putting too much weight on linear filtering techniques. Linear filtering amplifies crosstalk. Due to the differences in frequency content of the forward and crosstalk channels, overall SNR is degraded by linear filtering such as pre-emphasis. The role of pre-emphasis in this architecture is therefore reduced to the minimum required to overcome other implementation challenges such as the dynamic range of the nonlinear RX

Page 2: PMC_6G

equalizer, and the ability to recover clock from an unequalized signal.

Parallel to

SerialConverter

Serial toParallel

ConverterRX

DecisionFeedbackEqualizer

4.9GHz -6.4GHz

PLL

Multi-ClockGenerator

LMSAdaptationController

Digital ClockRecovery

Unit

TX

N bits@ rate/N

4.9Gb/sto

6.4Gb/s

4.9Gb/sto

6.4Gb/sN bits

@ rate/N

rate/2

Serial DiagnosticLoopback

Metallic Loopback

Fig. 1. Block Diagram of the 6.25-Gb/s transceiver

The receiver (RX) buffers and amplifies the incoming

signal, which is often attenuated and distorted after traveling through a long backplane. To compensate for distortion due to Inter-Symbol Interference (ISI) and to maximize SNR in the presence of crosstalk, a Decision Feedback Equalizer (DFE) is used. A DFE is a non-linear equalizer and therefore does not amplify crosstalk. Accordingly, it provides superior performance in crosstalk-heavy environments such as those that use high-density backplane connectors. The DFE coefficients are set using a least-mean-square (LMS) adaptation controller. After equalization, the received signal is presented to a data slicer. The clock recovery unit regenerates a bit clock from the received signal, which is used as a sampling clock for the data slicer. The clock is recovered from the unequalized signal to prevent complicated interaction with DFE adaptation.

III. CLOCK SYNTHESIS AND GENERATION

Clock synthesis is performed using a traditional PLL multiplier architecture. To achieve low random jitter, an LC Voltage Controlled Oscillator (VCO) is used. The VCO uses an integrated inductor with patterned ground shield and two accumulation-mode MOS varactors as shown in Fig. 2(a). Cross-coupled NMOS and PMOS pairs are employed in the VCO to compensate the losses in the LC tank.

L

Cvar CvarVC

M1 M2

M3 M4

IBIAS

highVCO

lowVCOSELVCOH

VC IBIAS(250uA)

VCOBuffers

cmlAL implementation

SELVCOH

(a) (b) Fig. 2. (a) LC VCO circuit, (b) Two VCOs with overlapping tuning range

used in clock synthesis PLL

Although LC VCOs provide excellent phase noise and jitter performance, one disadvantage is the limited tuning range. A simple LC VCO cannot tune over 4.9 GHz to 6.4 GHz reliably over process, voltage, and temperature (PVT). In this design, we have elected to use two LC VCOs with overlapping tuning ranges. One VCO is tuned to cover the low band (4.9 GHz to 5.8 GHz), while the other is tuned for the high band (5.7 GHz to 6.5 GHz). The difference between the two VCOs is the number of varactors in their LC tank.

The PLL output goes to the Multi-Clock Generator, which divides the 6-GHz range clock into a number of lower rate clock phases. The design is based on current-mode logic with active peaked loading (cmlAL) for increased bandwidth. The differential pair and active load transistors are all low-Vt nMOS with pwell in deep nwell. The clocks are buffered and distributed to multiple TX/RX links on the device. These buffers can support up to 8 TX/RX pairs while operating at the highest line rate of 6.4 Gb/s.

IV. HIGH-SPEED TRANSMITTER

The transmit block consists of a Parallel-to-Serial Converter (PISO) and a high-speed transmitter. The PISO accepts an N-bit word from the digital core and converts it into an up to 6.4-Gb/s serial data stream. It requires several clocks from the Multi-Clock Generator. A word-rate clock is used to latch in the digital word. Subsequent higher rate clocks are then used to serialize the data. The PISO drives a Current Mode Logic (CML) driver, shown in Fig. 3.

TXOPTXON

TPD[N]:

TPD[1]TPD[0]

TX CML DriverPISO

Clocks

Main Path

Pre-emphasis Path

Fig. 3. The 6.25-Gb/s CML Transmitter

The transmitter incorporates an internal termination and

can be programmed to provide variable swing from 0.4 to 1 Vpp differential (under worst-case conditions) with programmable one-tap pre-emphasis.

One-tap pre-emphasis is a first-order compensation for channel attenuation. Even though this is not sufficient on its own to reliably open the channel eye at 6.4 Gb/s, it reduces the work for the RX equalizer and also makes clock recovery easier. An adaptive biasing approach maximizes usable output swing by ensuring saturation of output devices.

V. RECEIVER WITH ADAPTIVE EQUALIZER

The receive section is comprised of several blocks: • Linear receiver with gain control (RX) • Decision Feedback Equalizer (DFE) • LMS Adaptation Controller • Digital Clock Recovery Unit (DCRU)

Page 3: PMC_6G

In order for the DFE and LMS adaptation to work, the

receiver must amplify the incoming signal to a target level, while not limiting or clipping it. Hence, the receiver must be linear for the signal range of interest. In addition, one should expect this transceiver to operate over a variety of backplane lengths. As a result, the receiver may see the full transmitter swing (for short traces) or a small signal with lots of ISI (for long backplanes). In order for the receiver to cope with this range of signals, a variable gain control is implemented to set the target internal swing at around 500 mVpp differential.

Clock recovery is performed on the unequalized input signal. This decouples the clock recovery and DFE coefficient adaptation feedback loops making for a robust and more deterministic system. Clock recovery can be accomplished on the unequalized signal despite the fact that the eye is closed to probabilities of 10-4 in some channels.

In this implementation, a digital bang-bang architecture is used for clock recovery. A phase interpolator modulates quadrature half-rate clocks to track the incoming data stream. Digital filters are used to maximize controllability and observability during test. An optimal sampling point in the eye is chosen to minimize pre-cursor ISI while recovering the data. This sampling point can have drastic effects on the BER.

Although the clock is recovered and the signal is amplified to the desired level, it still is typically severely distorted due to ISI. The DFE is used to remove most post-cursor ISI components. Fig. 4 shows a block diagram of the DFE.

LMS Coefficient Adaptation(digital)

b0

Σ Z -1 Z-1

b1 b9

DFE(analog)

Z-1RX

RDAT

RCLK Fig. 4. Decision Feedback Equalizer (DFE)

Note that this DFE can remove up to 10 post-cursor ISI components, which was found to be sufficient for a great majority of backplanes. Since the impulse response of the backplane is not known apriori, the DFE coefficients must be set automatically by some mechanism. This is accomplished using the LMS adaptation controller. The controller uses the sign-sign Least Mean Squares (LMS) algorithm, described by Eq. (1) below:

])1[sgn(*])[sgn(*][]1[ −−+=+ ktdtetCtC stepkk µ (1) for k = 0 to (N-1), where Ck[t] is equalizer coefficient number k at time t, µstep is the adaptation step size, e[t] is the equalizer output error at time t, d[t] is the equalizer output value at time t, N is the number of equalizer taps.

For pseudo-random input data, the LMS controller does not require a training sequence. This means it can adapt continuously to changing environments, such as temperature fluctuations, while passing system traffic. As with clock recovery, using a digital LMS controller provides great flexibility for test.

To save area and power, a single LMS engine is used in a patent-pending way for adapting all receive-side parameters including DFE coefficients, RX gain, and offset compensation [11]. Both the clock recovery and DFE are implemented using a half-rate architecture to ease timing.

VI. MEASUREMENT RESULTS

A 4.9-Gb/s to 6.4-Gb/s backplane serlializer/deserializer (SERDES) incorporating this transceiver was fabricated in early 2004 in a 0.13µm CMOS process with 1.2V and 2.5V power supplies, standard- and low-Vt transistors and deep-nwell option, and housed in 320-pin flipchip package. Fig. 5 shows the block diagram (top) and die micrograph (bottom) of the SERDES. In a typical application it multiplexes eight 3.125-Gb/s streams onto four 6.25-Gb/s channels. The chip power is 3 W total, which includes 258 mW per link for the 6.25-Gb/s interface and 123 mW per link for the 3.125-Gb/s interface, with the remaining nearly 1 W consumed by the digital core. Extensive lab characterization was performed, including BER measurements over different backplanes.

Fig. 6 shows the measured eye diagram and intrinsic jitter, taken right at the transmitter output. In this measurement, the transmitter is generating a PRBS-31 pattern at 6.25 Gb/s. The peak-to-peak jitter is shown to be 22 ps, or 0.14 UI.

BER testing over backplanes of various lengths was performed. Backplanes made from FR4 material, available from Tyco International, were used for these tests. These Tyco “legacy” backplanes are intended to emulate the existing 3.125-Gb/s XAUI systems deployed on the market today, which were not originally designed for such high rates.

Three different lengths of backplane were used in these tests: 5”, 20”, and 34”. The 34” backplane, together with connectors and daughter cards, provides the most severe attenuation out of the three and also contained several NEXT and FEXT (near- and far-end crosstalk) aggressors. The 6.4-Gb/s SERDES operated error-free over all three lengths. In particular, over 200 hours of continuous error-free operation was achieved on the 34” backplane, which translates into a BER better than 1E-15. The part also demonstrated error-free operation on numerous external 3.125-Gb/s backplanes. Furthermore, precise digital internal margining techniques

Page 4: PMC_6G

have allowed the extrapolation of the BER beyond 1E-18, which is not practical to measure in a lab setting. These internal margining techniques can be used to depict the effective post-sampler eye (a more accurate and stringent figure of merit than RX eye since it includes sampler non-idealities such as recovered clock jitter and receiver offsets). This post-sampler eye is depicted in Fig. 7. The spot near the middle indicates the sampling location. It can be seen that, while the voltage offset compensation appears to be working properly, a slight internal phase mismatch is present, although its effect on the overall performance is very small. This illustrates how this powerful feature can be used to diagnose subtle circuit and system problems.

JTAGTest Access

Port

ManagementInterface

ClockSynthesizer

REFCLKXTAL

Common ControlLogic

. . .

3GRX

RX3G0_P/N 3G Data& Clock

Recovery

Tx FIFO/Mux

Parallel toSerial

Conversion(PISO)

6GTX

TX6G0_P/N:

TX6G3_P/N

3GTX

Parallel toSerial

Conversion(PISO)

Rx FIFO/Demux

6G Data& Clock

Recovery

6GRX

RX6G0_P/N:

RX6G3_P/N

:

RX3G7_P/N

TX3G0_P/N:

TX3G7_P/N

Fig. 5. Block diagram (top) and micorgraph (bottom) of the 4.9-Gb/s to 6.4-Gb/s SERDES chip in 0.13µm CMOS

Fig. 6. Measured Transmit Eye and Jitter at 6.25 Gb/s

Fig. 7. Internal post-sampler eye at 6.4 Gb/s over 34” legacy backplane.

VII. CONCLUSION

A 4.9- to 6.4-Gb/s SERDES using conventional NRZ coding is implemented in a standard 0.13µm CMOS process. Using an adaptive Decision Feedback Equalizer technique, the chip is able to demonstrate a BER better than 1E-15 over a 34” (86-cm) legacy FR4 backplane. Each 6.4-Gb/s link consumes 285 mW. The product is compliant to the OIF CEI-6G LR (long reach) standard.

ACKNOWLEGMENT

The authors would like to acknowledge the hard work of many additional PMC-Sierra staff, in particular Ognjen Katic and Dr. Yuriy Greshishchev who made significant contributions to make this device possible.

REFERENCES

[1] V. Balan, et. al, “A 4.8–6.4 Gbps Serial Link for Backplane Applications using Decision Feedback Equalization,” Proc. IEEE CICC, Oct. 2004, pp. 31-34.

[2] M. Sorna, T. Beukema, et. al, “A 6.4Gb/s CMOS SerDes core with Feedforward and Decision-Feedback Equalization,” ISSCC Dig. Tech. Papers, Feb. 2005, pp. 62-63.

[3] P. Landman, et. al, “A Transmit Architecture with 4-Tap Feedforward Equalization for 6.25/12.5Gb/s Serial Backplane Communications,” ISSCC Dig. Tech. Papers, Feb. 2005, pp. 66-67.

[4] N. Krishnapura, et. al, “A 5Gb/s NRZ Transceiver with Adaptive Equalization for Backplane Transmission,” ISSCC Dig. Tech. Papers, Feb. 2005, pp. 60-61.

[5] K. Krishna, et. al, “A 0.6 to 9.6Gb/s Binary Backplane Transceiver Core in 0.13µm CMOS,” ISSCC Dig. Tech. Papers, Feb. 2005, pp. 64-65.

[6] R. Payne, et. al, “A 6.25Gb/s Binary Adaptive DFE with First Post-Cursor Tap Cancellation for Serial Backplane Communications,” ISSCC Dig. Tech. Papers, Feb. 2005, pp. 68-69.

[7] S. Wu, et. al, “Design of a 6.25Gb/s Backplane SerDes with Top-down Design Methodology,” DesignCon2004, Feb., 2004.

[8] OIF CEI 02.0, “Common Electrical I/O (CEI) – Electrical and Jitter Interoperability Agreements for 6G+ bps and 11G+ bps I/O,”,February 28, 2005.

[9] Sonntag et. al, “An Adaptive PAM-4 5Gb/s backplane transceiver in 0.25µm CMOS” Proc. IEEE CICC, 2002, pp. 363 – 366.

[10] Johnson, Howard. High-Speed Digital Design: A Handbook of Black Magic, Prentice Hall, NJ, 1993.

[11] US patent application 10/725,183 – “A Flexible Adaptation Engine for Adaptive Transversal Filters”.