4.4 Implementation Structures in FPGAs and DSPs

31
4.4 Implementation Structures in FPGAs and DSPs Presented by Lee Pucker President, ForwardLink Consulting

Transcript of 4.4 Implementation Structures in FPGAs and DSPs

4.4 Implementation Structures in FPGAs and DSPs

Presented by Lee PuckerPresident, ForwardLink Consulting

Agenda

Case Study on Implementation StructuresSynchronization in a GSM Network

Option 1: DSP Implementation of a GSM Sync Burst Matched CorrelatorOption 2: FPGA Implementation of a GSM Sync Burst Matched CorrelatorDiscussion on Trade-offs in Device SelectionConclusions

Typical problem – Synchronization on a GSM network

The Mobile Station and Base Station Terminals in a GSM network contain multiple clocks that run asynchronously

For these radios to communicate, they need to synchronize their clocks

This Problem is Addressed Through the GSM Logical Channel and Frame Structure

GSM Signals Are Transmitted in “Bursts”

1 Burst per Time SlotEach time slot contains 156.25 bits at 270.833 bits/sec8 Time Slots per Frame

One time slot in each downlink frame is used by the base station to transfer synchronization and control information

Referred to as logical channels

Every 10 frames, the base station transmits a Sync Channel (SCH) to facilitate synchronization by the mobile station with the base station

FCHSCH

BCCHBCCHBCCHBCCHCCCHCCCHCCCHCCCHFCHSCH

CCCHCCCHCCCHCCCHCCCHCCCHCCCHCCCHFCHSCH

CCCHCCCHCCCHCCCHCCCHCCCHCCCHCCCHFCHSCH

CCCHCCCHCCCHCCCHCCCHCCCHCCCHCCCHFCHSCH

CCCHCCCHCCCHCCCHCCCHCCCHCCCHCCCHIDLESource: ETSI 3GPP TS 45.002

The GSM Synchronization Channel (SCH)

The GSM Synchronization Burst Contains a “training sequence”of 64 bits that is used to facilitate synchronization

Sync bits are known by both the base station and the mobileDetecting where the synchronization burst occurs in time allows the mobile station to synchronize with the base stations baud clock and frame clockTracking the drift from SCH burst to SCH burst can be used by the mobile station to fine tune receiver clock

More on this when you study Synchronization later in this course

Process for detecting the GSM SCH

Oversample the received complex baseband GSM signal by 4X

This allows us to synchronize to the base station with an accuracy of .25 bits

Sometimes referred to as a qbitBaud rate = 270.833 MSymbols per second, so the sample rate is 1083.3333 MSamples per second

GMSK modulate the training sequence of the SCH to create the matched filter and “upsample” by 4

Creates 64 X 4 samples = 256 samples

Slide the received GSM sample data against the matched filter to find the correlation peak

Matched Filter for Detecting a SCH

The Correlation Function

Cn = MF(m) x S*(n+m) m = 0 Σ255

Let MF be the Matched Filter Sequence (Length 255)Let S be the Signal SequenceC be the Correlation Sequence

DSP Implementation of the GSM SCH Matched Filter

The TMS320C6416 Processor

A Typical DSP – the TMS320C6416T

Single C64x fixed point DSP coreIndependent L1 Program and L1 Data CacheBuilt in 8Mbit L2 CacheDual external memory interfacesViterbi and Turbo coprocessors3 independent timers

Source:TMS320C6414T, TMS320C6415T, TMS320C6416T Fixed Point Digital Signal Processors Data Sheet (SPRF226J)

The C64x VLIW Architecture

The C64x CPU hosts dual sets of functional units each with dedicated register files

.L and .S functional unit performs arithmetic, logical and branch operations .M functional units perform two 16 bit x16 bit multiplies per clock .D data addressing units are responsible for data transfers between register files and memory32 registers each with 32 bits

Source:TMS320C64x Technical Overview

Key issue – the need to support fixed point math

Most DSP’s utilize fixed point vs. floating point mathAllows for significant reduction in power and cost associated with the device

Fixed point math requires scaling to occur16 bit * 16 bit = 32 bit to maintain full precision16 bit + 16 bit = 17 bit to maintain full precisionIf A, B, C, D are 16 bit values, then A*B + C*D requires 33 bits

Can’t be supported on a processor that only does 32 bit arithmeticUsually handled by right shifting the products to make them 31 bit numbersThis is simplified if using unsigned or sign/magnitude values versus 2’s complement

GSM Synchronization Channel Matched Filter Implementation on a TMS320C6416t DSP

During each clock cycle (order is important, and optimizations are required)Add/subtract previous complex products to accommodate conjugation and add to previous sum with appropriate scaling using the .L and .S units

C_Real = W + X + Cr_ImagC_Imag = Y - Z + C_Imag

Computer new complex products using .m unitW = MF_Real * S_RealX = MF_Imag * S_ImagY = MF_Imag * S_RealZ = MF_Real * S_Imag

Decrement m (m = m-1).S Unit

If m = 0, Branch using S. UnitSave C to memory for Cn using .D unitIncrement n Set m = 256, Reset C to 0;

Load next S_Real (n + m), S_Image (n + m), MF_Real(m), MF_Imag(m) into A and B registers using .D unit

Cn = MF(m) x S*(n+m) m = 0 Σ255

Performance

Training sequence = 64/156.25 or .4096 of the bits in a sync burst For a 1 GHz Clock, the filter implementation shown in the previous slide operating with input data of 1MSPS will consume greater than .4 the cycles for 1 pass

Only 2 passes (n=0 and n=1) and the processor is exhausted

Solution is to Reduce precision to allow more operations per cycle, orGo to a faster processor

FPGA Implementation of the SCH Matched Filter

A typical FPGA for wireless signal processingThe Xilinx Virtex 5 SXT

Virtex 5 SXT FPGA has a number of “features” supporting wireless signal processing

DSP48E SliceEmbedded “Block Ram”Configurable Logic Blocks

Source: Virtex-5 Family Overview LX, LXT, and SXT Platforms

The Virtex 5 SXT DSP48E Slice

Source: Virtex-5 SXT Platform Technical Backgrounder

Virtex 4 SXT Block Ram

Up to 244 36kbit dual port clocksBuilt in address sequencing to support FIFO and shift register functions

Source: Virtex-5 Users Guide

Virtex 4 Configurable Logic Blocks

Primary logic resources provided by the FPGAEach CLB has 2 “slices”Each “slice”consists of mulitplelook up tables, registers, and combinatorial logic elements

GSM Synchronization Channel Matched Filter Implementation on a Virtex 5 SXT FPGA

S0 S1 S2 S3 S252 S253 S254 S255

X

Signal In

MF0 MF1 MF2 MF3 MF252 MF253 MF254 MF255

X X X X X X X

Sum

C Out

256 Tap Shift register created using BlockRAMFilter Taps Stored

in BlockRAM

Multiply and Accumulate Functions Supported via

DSP48 Slices

Multiply and Accumulate Functions Supported via

DSP48 Slices

Performance

Because the FPGA is an inherently parallel processor, the entire correlation can be done in 1 clock cycle

The Virtex 5 supports this at clock rates up to 550 MSPS

So why wouldn’t you always use an FPGA???

Design Considerations and Device Selection

Device Selection ProcessIdentify Algorithmsfor Implementing

Each Defined Block.Establish Functional

Requirements forAlgorithms

Map Algorithms toCandidate Processing

Devices andSelect Devices

Block Diagrams

Modify Block Diagramsto Better Align

Algorithms FromDifferent Air Interfaces

Block Diagrams withFunctional Requirements

Modify Algorithms Basedon Device Constraints

Modify Block Diagrams Basedon Device Constraints

DevelopBlock DiagramArchitectures

Supporting Each Modeof Target

Air Interfaces

Air InterfaceSpecificationsFrom Concept

Definition Selection Criteriafor

Processing Devices

Block Diagram withFunctional Requirements

Mappedto Selected Devices

This step may also include a “make versus buy” decision

Selection Criteria for Signal Processing Devices

PerformanceCan it do the job

ProgrammabilityLife cycle/maintenance costs

Level of IntegrationCost per unit

Development CycleCost of development

PowerBattery life

Example 4Device Mapping for 8-PSK Demod

Figure 7a: Base Architecture

Decimate by 30Digital Down

Converter(< 65 ops/sample)

Decimate by2 +/- ε

Resampler(< 62 ops/sample)

QuadraturePhase

Detector(< 100 ops/

sample)

SymbolRate

Detector(< 200 ops/

sample)

Integrateand Dump

Symbol RateAdjust

IF SignalFrom RF

Subsystem(65 MSPS)

RecoveredBits ToChannelDecoder

SymbolMapping

CarrierRate

Detector(< 100 ops/

sample)

Carrier Adjust

Baud Clock(270.833 kHz)

Example 4 ContinuedMapping 1

Figure 7b: Device Mapping 1

ASIC (Channelizer)

DSP or GPP(Channel Processor)

Decimate by 30Digital Down

Converter(< 65 ops/sample)

Decimate by2 +/- ε

Resampler(< 62 ops/sample)

QuadraturePhase

Detector(< 100 ops/

sample)

SymbolRate

Detector(< 200 ops/

sample)

Integrateand Dump

Symbol RateAdjust

IF SignalFrom RF

Subsystem(65 MSPS)

RecoveredBits ToChannelDecoder

SymbolMapping

CarrierRate

Detector(< 100 ops/

sample)

Carrier Adjust

Baud Clock(270.833 kHz)

Example 4 ContinuedMapping 2

Figure 7c: Device Mapping 2

FPGA (Channelizer)

DSP or GPP(Channel Processor)

Decimate by 30Digital Down

Converter(< 65 ops/sample)

Decimate by2 +/- ε

Resampler(< 62 ops/sample)

QuadraturePhase

Detector(< 100 ops/

sample)

SymbolRate

Detector(< 200 ops/

sample)

Integrateand Dump

Symbol RateAdjust

IF SignalFrom RF

Subsystem(65 MSPS)

RecoveredBits ToChannelDecoder

SymbolMapping

CarrierRate

Detector(< 100 ops/

sample)

Carrier Adjust

Baud Clock(270.833 kHz)

Example 4 ContinuedMapping 3

Figure 7d: Device Mapping 3

FPGA (Channelizer)DSP or GPP

(Channel Processor)

Decimate by 30Digital Down

Converter(< 65 ops/sample)

Decimate by2 +/- ε

Resampler(< 62 ops/sample)

QuadraturePhase

Detector(< 100 ops/

sample)

SymbolRate

Detector(< 200 ops/

sample)

Integrateand Dump

Symbol RateAdjust

IF SignalFrom RF

Subsystem(65 MSPS)

RecoveredBits ToChannelDecoder

SymbolMapping

CarrierRate

Detector(< 100 ops/

sample)

Carrier Adjust

Baud Clock(270.833 kHz)

What Generally Goes WhereWhat Generally Goes Where

ASIC/FPGA DSP GPP

Digital Down Conversion/ Digital Up Conversion

Resampling Resampling

Equalization and Pre-emphasis Filtering

Carrier Synchronization Carrier Synchronization

Chip Rate/Code Synchronization

Symbol Synchronization Symbol Synchronization

Spread/Despread Modulation/Demodulation Modulation/Demodulation

Carrier Synchronization Interleaving/ De-Interleaving

Interleaving/ De-Interleaving

Symbol Synchronization Packet Framing Packet Framing

Diversity Combining Error Correction Coding/Decoding

Error Correction Coding/Decoding

Resampling Link Layer Processing

Note: System on Chip (SoC) technology may integrate all of the above

Real world example (Source: EE Times)

DSP for baseband signal processing

GPP for higher levels of the

protocol stack

Hardware coprocessor (ASIC) for computationally

expensive functions

Discussion