Performancevergleich DSP vs. FPGA Werner FRIESENBICHLER 0526423.
Real-time FPGA Implementation of Transmitter Based DSP
Transcript of Real-time FPGA Implementation of Transmitter Based DSP
Real-time FPGA Implementation of Transmitter Based DSP
Philip, Watts(1,2), Robert Waegemans(2), Yannis Benlachtar(2), Polina Bayvel(2), Robert Killey(2)
(1) Computer Laboratory, University of Cambridge, UK, [email protected]
(2) Optical Networks Group, Dept of Electronic and Electrical Engineering, University College London, UK, [email protected]
20+ GSa/s DSP Implemented on FPGAs
• Mask set for 45nm CMOS >$1M and rising– Shuttle runs are inflexible and require high investment in
CAD tools
• FPGAs allow researchers to implement real time DSP for optical communications research– Investigate and verify algorithms– Allow long measurement times for accurate BER– Product prototyping
• Paper will describe considerations for >20 GSa/s real time DSP in transmitter based applications– 10.7 Gb/s electronic predistortion (EPD)– Higher order modulation formats: OFDM– Potential for higher bit/sample rates
Growing Power of FPGAs• Growing numbers of logic
cells, RAM and DSP blocks enable increasingly sophisticated DSP
• High speed serial transceivers (MGT) are crucial to achieve high DSP sample rates
• Source: Xilinx, although other FPGA vendors show similar trends
Design Considerations
• Off-chip bandwidth– 21.4 GSa/s DSP with 4-6 bits output requires 85.6 – 128.4
Gb/s bandwidth→ Possible since 2006 (Virtex-4)
– External time division multiplexing is usually required as maximum serial rate is 11.2 Gb/s
• Clock rate and parallelism– Due to mismatch between DSP throughput and FPGA clock
rate (few 100’s MHz) high degree of parallelism required– Constrained by:
→ Timing closure (easier with higher parallelism)→ Resource use (increases with parallelism)→ MGT input data width
An FPGA Design for 21.4 GSa/s, 4-bit DSP
• 16 MGT operating at 5.35 Gb/s used to achieve off-chip bandwidth of 85.6 Gb/s– Requires external 4:1 TDM and DAC
• Iterative process to determine clock rate of 167.25 MHz– MGT data width = 32– 64 bits processed in parallel
Digital-to-Analog Converters (DAC)• Access to high speed DAC
(and ADC) technology has been problematic
• UCL 21.4 GSa/s 4-bit DAC is constructed from discrete components– Not scalable beyond 4-bits
without adding amplifiers– Time alignment with FPGA
using test signals and manual process
• Micram 25+GSa/s DAC– 5.5 bit effective <6.25 GHz– Automated time alignment
UCL Discrete Component DAC
Micram DAC interfaced to FPGA , from R.Waegemans et al, Optics Exp 17 (2009)
10.7 Gb/s EPD Transmitter
• Two FPGA/DAC for processing real and imaginary parts of field– Each FPGA processes
independently– Board Synchronisation
required at pattern and output bit level
• 4-bit discrete DAC used
• Cartesian MZM to generate required field
P.M.Watts et al., Opt Exp 16, 12171 (2008)
EPD Transmitter Hardware
EPD using Linear Compensation• FIR filters for chromatic dispersion compensation
– 128 parallel FIRs, pipelined over 5 clock cycles– At transmitter, FIR inputs are 1-bit
→ Low resource use→ Time domain is most efficient up to 200 taps
– Exploit symmetrical impulse response
• 55-tap FIRs– 71% of logic resources used on each Virtex-4 FX100
P.M.Watts et al., Opt Exp 16, 12171 (2008)
EPD using Non-linear Compensation
• RAM size scales exponentially with chromatic dispersion– 22 Mb on-chip RAM available
on latest FPGA– 1 Gb required for 1200 km– LUT compression techniques
do not work on FPGA due to fixed block RAM size
• Using off-chip DRAM is a major challenge– Memory bandwidth– Memory access time
reliability issuesR.I.Killey et al., PTL 17, 714 (2005)P.M.Watts et al., JLT 25, 3089 (2007)
10.7 Gb/s Nonlinear EPD Transmitter (in collaboration with Ericsson)
• Same EPD hardware scheme used except for:– Virtex-4 FX140 to Increase off-
chip bandwidth and on-chip RAM
– Micram integrated 6-bit DACs
• LUTs with 11-bit address implemented on each FPGA
• At 450 km SSMF, less than 0.5 dB extra penalty measured increasing the launch power from 0 dBm to +4dBm
R.Waegemans et al, Opt Express 17 (2009)
OFDM (in collaboration with Intel Research and Carnegie Mellon University)
• Same scheme can be used for generation of advanced modulation formats– Real time 21.4 GSa/s OFDM generated using UCL system
• DSP block replaced with:– QPSK encoders– Optimised IFFT with 10-bit data path (generated through
SPIRAL project)– Digital clipping circuits
• 84% of logic slices used on the Virtex-4 FX100• Transmitted 8.36 Gb/s digitally up-converted SSB-
OFDM signal over 1600km of uncompensated fiber with a BER<10-3
Y.Benlachtar et al, Accepted for Pub, Optics Exp.Y. Benlachtar at al, Post-deadline, ECOC 2009
Options for Increased 10 Gb/s Compensation
• Use a higher proportion of logic resources– Timing closure becomes increasingly difficult approaching
100% resource use
• DSP blocks offered on high-end FPGAs useful in some applications– Timing issues routing to fixed blocks– Limited number of MACs for highly parallel applications
• Use of dispersion tolerant modulation formats– Investigate resource use of additional circuits
→ Differential encoding→ Multi-level coding increases DSP compexity→ Additional filters (duobinary)
• Rely on Moore’s Law and use the latest FPGAs
Options for Increased 10 Gb/s Compensation• Between ‘04 and ’09
– Logic cells x3.8– RAM x2.3
• Linear scaling of FIR enables EPD with 210 taps – 4000km SSMF– Maintaining 71% usage
• Increase in RAM not sufficient to significantly increase non-linear compensation– LUT compression [Killey OFC’06]– Apply LUT as perturbation
[McGhan OFC’06]– Solve non-linear Schrödinger
equation in real time(!)[Li et al, OptExp 2008]
Increased Bit/Sample Rates• Increased sample rates require:
– Increased off-chip bandwidth, DAC conversion rate and DSP throughput→ scaling linearly with bit rate
– and processor memory→ scaling quadratically
• Off-chip bandwidth of latest FPGAsallows 40 Gb/s NRZ-OOK EPD– BUT, 80 GSa/s DAC required!
• Multilevel modulation formats combined with continuous growth in logic cell density with each FPGA generation– Back of envelope: ≈400 km SSMF with
42.8 Gb/s linear DQPSK EPD with latest FPGA
• Highly dispersion tolerant formats such as OFDM do not require processor memory scaling
Thank You for Your Attention
Acknowledgements• Funding and support:
– Royal Commission for the Exhibition of 1851– UK Engineering and Physical Sciences Research Council– The Royal Society– Intel Corporation– Huawei Technologies– Ericsson GmbH.
• The authors are grateful for the contributions to the work from – Dr Madeleine Glick (Intel Research)– Dr Stefan Herbst and Dr Cornelius Fürst (Ericsson GmbH)– Prof. Markus Püschel, Prof. James Hoe, Peter Milder and Robert
Koputsoyannis (Carnegie Mellon University).